:: Publication Details
| Year: | 2008 : 2007 : 2006 : 2005 : 2004 : 2003 : 2002 : 2001 : 2000 : 1999 : 1998 : 1997 : 1996 : 1995 : 1994 : 1993 : 1992 : 1991 : 1990 : 1988 : |
:: 2008
Dynamic Programming Strikes Back
Year: 2008 Journal: Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada. BibTeX: @inproceedings{Moerkotte:2008:DPS, author = {Guido Moerkotte and Thomas Neumann}, title = {Dynamic Programming Strikes Back}, booktitle = {Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada.}, year = {2008} }Abstract: Two highly efficient algorithms are known for optimally ordering joins while avoiding cross products: DPccp, which is based on dynamic programming, and Top-Down Partition Search, based on memoization. Both have two severe limitations: They handle only (1) simple (binary) join predicates and (2) inner joins. However, real queries may contain complex join predicates, involving more than two relations, and outer joins as well as other non-inner joins. Taking the most efficient known join-ordering algorithm, DPccp, as a starting point, we first develop a new algorithm, DPhyp, which is capable to handle complex join predicates efficiently. We do so by modeling the query graph as a (variant of a) hypergraph and then reason about its connected subgraphs. Then, we present a technique to exploit this capability to efficiently handle the widest class of non-inner joins dealt with so far. Our experimental results show that this reformulation of non-inner joins as complex predicates can improve optimization time by orders of magnitude, compared to known algorithms dealing with complex join predicates and non-inner joins. Once again, this gives dynamic programming a distinct advantage over current memoization techniques.
The Demaq System: Declarative Development of Distributed Applications
Year: 2008 Journal: Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada. (Demonstration) BibTeX: @inproceedings{Boehm:2008:TDS, author = {Alexander Böhm and Erich Marth and Carl-Christian Kanne}, title = {The Demaq System: Declarative Development of Distributed Applications}, booktitle = {Proceedings of the 28th ACM SIGMOD/PODS International Conference on Management of Data / Principles of Database Systems, Vancouver, BC, Canada.}, note = {Demonstration}, year = {2008} }Abstract: The goal of the Demaq project is to investigate a novel way of thinking about distributed applications that are based on the asynchronous exchange of XML messages. Unlike today's solutions that rely on imperative programming languages and multi-tiered application servers, Demaq uses a declarative language for implementing the application logic as a set of rules. A rule compiler transforms the application specifications into execution plans against the message history, which are evaluated using our optimized runtime engine. This allows us to leverage existing knowledge about declarative query processing for optimizing distributed applications.
Faster Join Enumeration for Complex Queries
Year: 2008 Journal: Proceedings of the 24th International Conference on Data Engineering, Cancún, México. (Poster) BibTeX: @inproceedings{Moerkotte:2008:FJE, author = {Guido Moerkotte and Thomas Neumann}, title = {Faster Join Enumeration for Complex Queries}, booktitle = {Proceedings of the 24th International Conference on Data Engineering, Cancún, México}, note = {Poster}, year = {2008} }Abstract: Most existing join ordering algorithms concentrate on join queries with simple join predicates and inner joins only, where simple predicates are those that involve exactly two relations. However, real queries may contain complex join predicates, i.e. predicates involving more than two relations. We show how to handle complex join predicates efficiently, by modeling the query graph as a hypergraph and reasoning about its connected subgraphs.
:: 2007
Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals
Year: 2007 Journal: Proceedings of the Fifth International XML Database Symposium (XSym), Vienna, Austria Download: PDF BibTeX: @inproceedings{may:2007:exe, author = {Norman May and Guido Moerkotte}, title = {Efficient XQuery Evaluation of Grouping Conditions with Duplicate Removals}, booktitle = {Proc. of Fifth International XML Database Symposium, Vienna, Austria}, }Abstract: Currently, grouping in XQuery must be expressed implicitly with nested FLWOR expressions. With XQuery 1.1, an explicit group by clause will be part of this query language. As users integrate this new construct into their applications, it becomes important to have efficient evaluation techniques available to process even complex grouping conditions. Among them, the removal of distinct values or distinct nodes in the partitions defined by the group by clause is not well-supported yet. The evaluation technique proposed in this paper is able to handle duplicate removal in the partitions efficiently. Experiments show the superiority of our solution compared to state-of-the-art query processing.
Let a Single FLWOR Bloom (to improve XQuery plan generation)
Year: 2007 Journal: Proceedings of the Fifth International XML Database Symposium (XSym), Vienna, Austria Download: PDF BibTeX: @inproceedings{brantner:2007:DXE, author = {Matthias Brantner and Carl-Christian Kanne and Guido Moerkotte}, title = {Let a Single {FLWOR} Bloom (to improve XQuery plan generation)}, booktitle = {Proc. of Fifth International XML Database Symposium, Vienna, Austria}, }Abstract: To globally optimize execution plans for XQuery expressions, a plan generator must generate and compare plan alternatives. In proven compiler architectures, the unit of plan generation is the query block. Fewer query blocks mean a larger search space for the plan generator and lead to a generally higher quality of the execution plans. The goal of this paper is to provide a toolkit for developers of XQuery evaluators to transform XQuery expressions into expressions with as few query blocks as possible. Our toolkit takes the form of rewrite rules merging the inner and outer FLWOR expressions into single FLWORs. We focus on previously unpublished rewrite rules, and on inner FLWORs occurring in the for, let, and return clauses in the outer FLWOR.
Let a Single FLWOR Bloom
Year: 2007 Journal: Technical Report Download: PDF BibTeX: @TechReport{brantner:2007:DXE, author = {Matthias Brantner and Carl-Christian Kanne and Guido Moerkotte}, title = {Let a Single {FLWOR} Bloom}, note = {http://madoc.bib.uni-mannheim.de/madoc/volltexte/2007/1428/pdf/TR_2007_007.pdf} institution = {University of Mannheim}, year = {2007} }Abstract: To globally optimize execution plans for XQuery expressions, a plan generator must generate and compare plan alternatives. In proven compiler architectures, the unit of plan generation is the query block. Fewer query blocks mean a larger search space for the plan generator and lead to a generally higher quality of the execution plans. The goal of this paper is to provide a toolkit for developers of XQuery evaluators to transform XQuery expressions into expressions with as few query blocks as possible. Our toolkit takes the form of rewrite rules merging the inner and outer FLWOR expressions into single FLWORs. We focus on previously unpublished rewrite rules, and on inner FLWORs occurring in the for, let, and return clauses in the outer FLWOR.
Declarative Development of Distributed Applications
Year: 2007 Journal: Proceedings SIGMOD2007 Ph.D. Workshop on Innovative Database Research 2007(IDAR2007) (Poster Paper) BibTeX: @inproceedings{Boehm:2007:DDD, author = {Alexander Böhm}, title = {Declarative Development of Distributed Applications}, booktitle = {Proceedings SIGMOD2007 Ph.D. Workshop on Innovative Database Research 2007(IDAR2007)}, note = {Poster Paper}, year = {2007} }Abstract: We present a novel approach for the convenient implementation and efficient execution of distributed XML message processing applications. Various application classes such as Web Services are based on asynchronously exchanging XML data. However, today's programming languages and execution systems fail to support their particular demands such as integrated XML type support and efficient, asynchronous messaging. As a result, the development of distributed applications is unnecessarily complex and their runtime performance deteriorates. To overcome these deficits, we propose a fully declarative XML processing language that allows for the convenient specification of an individual node participating in a distributed application. It is complemented by a corresponding runtime system for reliable and efficient application execution.
Declarative Development of Distributed Applications
Year: 2007 Journal: Technical Report Download: PDF BibTeX: @techreport{Boehm:2007:DDDTR, author = {Alexander Böhm}, title = {Declarative Development of Distributed Applications}, institution = {University of Mannheim}, year = {2007} }Abstract: We present a novel approach for the convenient implementation and efficient execution of distributed XML message processing applications. Various application classes such as Web Services are based on asynchronously exchanging XML data. However, today's programming languages and execution systems fail to support their particular demands such as integrated XML type support and efficient, asynchronous messaging. As a result, the development of distributed applications is unnecessarily complex and their runtime performance deteriorates. To overcome these deficits, we propose a fully declarative XML processing language that allows for the convenient specification of an individual node participating in a distributed application. It is complemented by a corresponding runtime system for reliable and efficient application execution.
Demaq: A Foundation for Declarative XML Message Processing
Year: 2007 Journal: Proc. 3. Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA Download: Paper BibTeX: @InProceedings{Boehm:2007:DFD, author = {A. B\"ohm and C-C. Kanne and G. Moerkotte}, title = {Demaq: A Foundation for Declarative XML Message Processing}, year = {2007}, booktitle = {Proc. 3. Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA}, }Abstract: This paper gives an overview of Demaq, an XML message processing system operating on the foundation of transactional XML message queues. We focus on the syntax and semantics of its fully declarative, rule-based application language and demonstrate our message-based programming para\-digm in the context of a case study. Further, we discuss optimization opportunities for executing Demaq programs.
Unnesting Scalar SQL Queries in the Presence of Disjunction
Year: 2007 Journal: Proc. 23. ICDE Conference, Istanbul, Turkey Download: Paper BibTeX: @InProceedings{Brantner:2007:USQ, author = {M. Brantner and N. May and G. Moerkotte}, title = {Unnesting Scalar {SQL} Queries in the Presence of Disjunction}, year = {2007}, booktitle = {Proc. 23. ICDE Conference, Istanbul, Turkey}, }Abstract: Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with such queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains of our approach.
:: 2006
On the Optimal Ordering of Maps, Selections, and Joins Under Factorization
Year: 2006 Journal: Proceedings of 23rd British National Conference on Databases (BNCOD) Download: PDF, Springer BibTeX: @inproceedings{DBLP:conf/bncod/NeumannHM06, author = {Thomas Neumann and Sven Helmer and Guido Moerkotte}, title = {On the Optimal Ordering of Maps, Selections, and Joins Under Factorization.}, booktitle = {Proceedings of 23rd British National Conference on Databases, Belfast, Northern Ireland, UK}, year = {2006}, pages = {115-126} }Abstract: We examine the problem of producing the optimal evaluation order for queries containing joins, selections, and maps. Specifically, we look at the case where common subexpressions involving expensive UDF calls can be factored out. First, we show that ignoring factorization during optimization can lead to plans that are far off the best possible plan: the difference in cost between the best plan considering factorization and the best plan not considering factorization can easily reach several orders of magnitude. Then, we introduce optimization strategies that produce optimal left-deep and bushy plans when factorization is taken into account. Experiments (1) confirm that factorization is a critical issue when it comes to generating optimal plans and (2) we show that to consider factorization does not make plan generation significantly more expensive.
Kappa-Join: Efficient Execution of Existential Quantification in XML Query Languages
Year: 2006 Journal: Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea Download: Paper, Springer BibTeX: @InProceedings{Brantner:2006:KEE, author = {M. Brantner and S. Helmer and C-C. Kanne and G. Moerkotte}, title = {{Kappa-Join}: Efficient Execution of Existential Quantification in {XML} Query Languages}, booktitle = {Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea}, pages = {1--15} year = {2006} }Abstract: XML query languages feature powerful primitives for formulating queries, involving comparison expressions which are existentially quantified. If such comparisons involve several scopes, they are correlated and, thus, become difficult to evaluate efficiently. In this paper, we develop a new ternary operator, called Kappa-Join, for efficiently evaluating queries with existential quantification. In XML queries, a correlation predicate can occur conjunctively and disjunctively. Our decorrelation approach not only improves performance in the conjunctive case, but also allows decorrelation of the disjunctive case. The latter is not possible with any known technique. In an experimental evaluation, we compare the query execution times of the Kappa-Join with existing XPath evaluation techniques to demonstrate the effectiveness of our new operator.
Index vs. Navigation in XPath Evaluation
Year: 2006 Journal: Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea Download: Paper, Springer BibTeX: @InProceedings{May:2006:INX, author = {N. May and M. Brantner and A. B\"ohm and C-C. Kanne and G. Moerkotte}, title = {Index vs. Navigation in XPath Evaluation}, booktitle = {Proc. Workshop Fourth International XML Database Symposium, Seoul, Korea}, pages = {16--30} year = {2006} }A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and Approximation Algorithms in Natix
Year: 2006 Journal: Proc. 32nd VLDB Conference, Seoul, Korea BibTeX: @InProceedings{Kanne:2006:TFX, author = {C.-C. Kanne and G. Moerkotte}, title = {A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and Approximation Algorithms in Natix}, year = {2006}, booktitle = "Proc. 32nd VLDB Conference, Seoul, Korea" }Abstract: Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance. We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning. Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm. Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.
A Linear-Time Algorithm for Optimal Tree Sibling Partitioning and Approximation Algorithms in Natix
Year: 2006 Journal: Technical report Download: local, Library Server BibTeX: @TechReport{Kanne:2006:LTA_TR, author = "Carl-Christian Kanne and Guido Moerkotte", title = {A Linear Time Algorithm for Optimal Tree Sibling Partitioning and its Application to {XML} Data Stores}, institution = "University of Mannheim, Department for Mathematics and Computer Science", year = "2006", type = "Technical Report", number = "TR-2006-006" }Abstract: Document insertion into a native XML Data Store (XDS) requires to partition the document tree into a number of storage units with limited capacity, such as records on disk pages. As intra partition navigation is much faster than navigation between partitions, minimizing the number of partitions has a beneficial effect on query performance. We present a linear time algorithm to optimally partition an ordered, labeled, weighted tree such that each partition does not exceed a fixed weight limit. Whereas traditionally tree partitioning algorithms only allow child nodes to share a partition with their parent node (i.e. a partition corresponds to a subtree), our algorithm also considers partitions containing several subtrees as long as their roots are adjacent siblings. We call this sibling partitioning. Based on our study of the optimal algorithm, we further introduce two novel, near-optimal heuristics. They are easier to implement, do not need to hold the whole document instance in memory, and require much less runtime than the optimal algorithm. Finally, we provide an experimental study comparing our novel and existing algorithms. One important finding is that compared to partitioning that exclusively considers parent-child partitions, including sibling partitioning as well can decrease the total number of partitions by more than 90%, and improve query performance by more than a factor of two.
Template Folding for XPath
Year: 2006 Journal: Third International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2006) Download: paper BibTeX: @InProceedings{Kanne:2006:TFX, author = {C.-C. Kanne and G. Moerkotte}, title = {Template Folding for XPath}, year = {2006}, booktitle = "Third International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2006)" }Abstract: We discuss query evaluation for XML-based server systems where the same query is evaluated on every incoming XML message. In a typical scenario, many of the incoming messages will be highly similar to each other. Current XML query evaluators reevaluate the query from scratch on every message. We call substructures that occur in many input documents template fragments, and introduce a novel template folding method that allows to move the work of evaluating the query on recurring document substructures from the query execution engine into the query compiler. Similar to constant folding, our method avoids run-time evaluation of intermediate results whose value only depends on information that is already available at compile time. For XPath location paths, we propose a representation for such invariant intermediate results, and show how it can be incorporated into query execution plans. Such augmented execution plans improve query performance when evaluating the same query on subsequent input documents.
Implementierung von skalierbaren, hochperformanten Web Services durch deklarative Nachrichtenverarbeitung
Year: 2006 Journal: Proc. doIT Software-Forschungstag 2006, Mannheim BibTeX: @InProceedings{Boehm:2006:ISH, author = {Alexander B\"ohm and Carl-Christian Kanne and Guido Moerkotte}, title = {Implementierung von skalierbaren, hochperformanten {W}eb {S}ervices durch deklarative {N}achrichtenverarbeitung}, booktitle = {Proc. doIT Software-Forschungstag 2006, Mannheim}, year = {2006}, note = {In German} }Abstract: Der zunehmende Einsatz von Web Services für erfolgsentscheidende Unternehmensanwendungen erfordert schnelle und flexible Anwendungsentwicklung sowie hochperformante und zuverlässige Ausführungsumgebungen. Existierende Applikationsserver erfüllen diese Anforderungen nur partiell. Wir stellen eine deklarative Regelsprache zur Definition von Geschäftsprozessen und ein zugehöriges Systemdesign auf der Basis von Nachrichtenwarteschlangen und einem nativen XML-Datenbanksystem vor.
Implementierung von skalierbaren, hochperformanten Web Services durch deklarative Nachrichtenverarbeitung
Year: 2006 Journal: Technical Report Download: local, Library Server BibTeX: @TechReport{Boehm:2006:ISH_TR, author = {Alexander B\"ohm and Carl-Christian Kanne and Guido Moerkotte}, title = {Implementierung von skalierbaren, hochperformanten {W}eb {S}ervices durch deklarative {N}achrichtenverarbeitung}, institution = {University of Mannheim, Department for Mathematics and Computer Science}, year = {2006}, type = {Technical Report}, number = {TR-2006-014}, note = {http://db.informatik.uni-mannheim.de/publications/TR-2006-014.pdf} }Abstract: Der zunehmende Einsatz von Web Services für erfolgsentscheidende Unternehmensanwendungen erfordert schnelle und flexible Anwendungsentwicklung sowie hochperformante und zuverlässige Ausführungsumgebungen. Existierende Applikationsserver erfüllen diese Anforderungen nur partiell. Wir stellen eine deklarative Regelsprache zur Definition von Geschäftsprozessen und ein zugehöriges Systemdesign auf der Basis von Nachrichtenwarteschlangen und einem nativen XML-Datenbanksystem vor.
Unnesting SQL Queries in the Presence of Disjunction
Year: 2006 Journal: Technical report Download: technical report BibTeX: @TechReport{Brantner:2006:DSQ, author = {M. Brantner and N. May and G. Moerkotte}, title = {Unnesting {SQL} Queries in the Presence of Disjunction}, institution = {University of Mannheim}, year = {2006}, month = {March}, note = {http://db.informatik.uni-mannheim.de/publications/TR-06-013.pdf} }Abstract: Optimizing nested queries is an intricate problem. It becomes even harder if in a nested query the linking predicate or the correlation predicate occurs disjunctively. We present the first unnesting strategy that can effectively deal with these queries. The starting point of our approach is to translate SQL into the relational algebra extended by bypass operators. Then we present for the first time unnesting equivalences which are valid for algebraic expressions containing bypass operators. Applying these to the translated queries results in our effective unnesting strategy for nested SQL queries with disjunction. With an extensive experimental study (including three commercial DBMSs), we demonstrate the possible performance gains and limitations of our approach.
A Declarative Control Language for Dependable XML Message Queues
Year: 2006 Journal: Proc. Workshop Dependability in Large-scale Service-oriented Systems (at ARES Conference), Vienna, Austria Download: Paper BibTeX: @InProceedings{Boehm:2006:DCL, author = {Alexander Böhm and Carl-Christian Kanne and Guido Moerkotte}, title = {A Declarative Control Language for Dependable XML Message Queues}, booktitle = {Proc. Workshop Dependability in Large-scale Service-oriented Systems (at ARES Conference), Vienna, Austria}, year = {2006} }Abstract: We present a novel approach for the implementation of efficient and dependable web service engines (WSEs). A WSE instance represents a single node in a distributed network of participants that communicate using XML messages. We introduce a fully declarative language custom-tailored to XML message processing that allows to specify business processes in a concise manner. To support the efficient and reliable evaluation of our language, we show how to augment a native, transactional XML data store with efficient and reliable XML message queues.
Strategies for Query Unnesting in XML Databases
Year: 2006 Journal: ACM Transactions on Database Systems Download: Paper, ACM TODS BibTeX: @Article{May:2006:SQU, author = {N. May and S. Helmer and G. Moerkotte}, title = {Strategies for Query Unnesting in XML Databases}, journal = {ACM Transactions on Database Systems} volume = {31}, number = {3}, pages = {968--1013}, year = {2006}, }Kappa-Join: Efficient Execution of Existential Quantification in XML Query Languages
Year: 2006 Journal: Technical Report Download: Technical Report BibTeX: @TechReport{Brantner:2006:KPJTR, author = {M. Brantner and S. Helmer and C-C. Kanne and G. Moerkotte}, title = {Kappa-Join: Efficient Execution of Existential Quantification in XML Query Languages}, institution = {University of Mannheim}, note = {http://madoc.bib.uni-mannheim.de/madoc/volltexte/2006/1227/}, year = {2006} }DP-Counter Analytics
Year: 2006 Journal: Technical report Download: Technical Report BibTeX: @TechReport{Moerkotte:2006:DCA, author = {G. Moerkotte}, title = {DP-Counter Analytics}, institution = {University of Mannheim}, year = {2006} }Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products / DP-Counter Analytics
Year: 2006 Journal: Proc. 32nd VLDB Conference, Seoul, Korea Download: Paper BibTeX: @InProceedings{Neumann:2006:ATE, author = {T. Neumann and G. Moerkotte}, title = {Analysis of Two Existing and One New Dynamic Programming Algorithm for the Generation of Optimal Bushy Join Trees without Cross Products}, year = {2006}, booktitle = "Proc. 32nd VLDB Conference, Seoul, Korea" }Ontologies for Location-based Services
Year: 2006 Journal: In Handbook of Research in Mobile Business: Technical, Methodological and Social Perspectives (ed.: B. Unhelkar), Idea Group, Inc., Hershey, PA, USA Natix Visual Interfaces
Year: 2006 Journal: Proc. 10. International Conference on Extending Database Technology (Demo Paper) Download: paper, Conference Proceedings BibTeX: @InProceedings{Boehm:2006:NVI, author = {Alexander Böhm and Matthias Brantner and Carl-Christian Kanne and Norman May and Guido Moerkotte}, title = {Natix Visual Interfaces}, booktitle = {Proc. 10. International Conference on Extending Database Technology}, year = {2006}, pages = {1125 - 1129} }Abstract: We present the architecture of the new version of Natix. Among the new features of this native XML DBMS are an optimizing XPath query compiler and a powerful API. In our demonstration we explain this API and present XPath evaluation in Natix using its visual explain facilities.
The importance of sibling clustering for efficient bulkload of XML document trees
Year: 2006 Journal: IBM Systems Journal, Number 2, Volume 45 Download: from IBM Systems Journal Website BibTeX: @InProceedings{Kanne:2006:ISC, author = {Carl-Christian Kanne and Guido Moerkotte}, title = {The importance of sibling clustering for efficient bulkload of XML document trees}, booktitle = {IBM Systems Journal}, volume = {45}, number = {2}, year = {2006} }Abstract: In an XML Data Store (XDS), importing documents from external sources is a very frequent operation. Because a document import consists of a large number of individual node inserts, it is essentially a small bulkload operation, and thus efficient bulkload support is crucial for the performance of the XDS. The bulkload operation is in essence a mapping of an XML parser's output into the storage structures of the XDS. This involves two major subtasks: (1) partitioning the document's logical tree structure into subtrees that can be stored on a page in a way that is both space-efficient and suitable for later processing and (2) mapping the subtrees to the internal representation of the XDS for paging. In enterprise-scale environments with very large documents and many parallel bulkload operations, the first task is particularly challenging, as not only disk space consumption, but also CPU and main-memory usage are important factors. In this paper, we discuss the requirements for an XDS bulkload component and examine existing algorithms for tree partitioning and their applicability to the bulkload operation. We derive a new tree-partitioning algorithm for use in the bulkload operation and present the design of the bulkload component for the XDS Natix. Finally, we evaluate the performance of the bulkload component and compare our results with previous work.
Algebraic Optimization of Nested XPath Expressions
Year: 2006 Journal: Proc. 22. ICDE Conference, Atlanta, USA (Poster Paper) Download: paper BibTeX: @InProceedings{Brantner:2006:AON, author = "Matthias Brantner and Carl-Christian Kanne and Sven Helmer and Guido Moerkotte", title = "Algebraic Optimization of Nested XPath Expressions", booktitle = "Proc. 22. ICDE Conference, Atlanta", year = "2006", pages = "128", }Abstract: The XPath language incorporates powerful primitives for formulating queries containing nested subexpressions which are existentially or universally quantified. However, even the best published approaches for evaluating XPath have unsatisfactory performance when applied to nested queries. We examine optimization techniques that unnest complex XPath queries. For this purpose, we classify XPath expressions particularly with regard to properties that are relevant for unnesting. We present algebraic equivalences that transform nested expressions into unnested expressions. In our experiments we compare the evaluation times with existing XPath evaluators and the naive evaluation.
:: 2005
Formally Specifying the Syntax and Semantics of a Visual Query Language for the Domain of High Energy Physics Data Analysis
Year: 2005 Journal: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC) BibTeX: @INPROCEEDINGS{AmHeMo05, AUTHOR = {V. Amaral and S. Helmer and G. Moerkotte}, TITLE = {Formally Specifying the Syntax and Semantics of a Visual Query Language for the Domain of High Energy Physics Data Analysis}, BOOKTITLE = {IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC)}, PAGES = {251-258}, YEAR = {2005} }A Model-Based Monitoring and Diagnosis System for a Space-Based Astrometry Mission
Year: 2005 Journal: Proc. 16th International Conference on Database and Expert Systems Applications, Copenhagen, Denmark BibTeX: @InProceedings{Pavlov:2005:MBM, author = "Aleksei Pavlov and Sven Helmer and Guido Moerkotte", title = "A Model-Based Monitoring and Diagnosis System for a Space-Based Astrometry Mission", booktitle = "Proc. 16th International Conference on Database and Expert Systems Applications, Copenhagen, Denmark", pages = "920-929", year = "2005", }Algebraic Translation and Optimization of (Nested) XPath Expressions
Year: 2005 Journal: Proc. Dutch Belgian Database Day, Amsterdam, Netherlands BibTeX: @InProceedings{Brantner:2005:ATO, author = "Matthias Brantner and Carl-Christian Kanne and Sven Helmer and Guido Moerkotte", title = "Algebraic Translation and Optimization of (Nested) XPath Expressions", booktitle = "Proc. Dutch Belgian Database Day", year = "2005", }Main Memory Implementations for Binary Grouping
Year: 2005 Journal: 3rd Int. XML Database Symposium (XSym 2005), Trondheim, Norway Download: paper BibTeX: @InProceedings{May:2005:MMI, author = "Norman May and Guido Moerkotte", title = "Main Memory Implementations for Binary Grouping", booktitle = "Proc. 3rd Int. XML Database Symposium (XSym 2005), Trondheim, Norway", year = "2005", pages = "162--176", }Abstract: An increasing number of applications depend on efficient storage and analysis features for XML data. Hence, query optimization and efficient evaluation techniques for the emerging XQuery standard become more and more important. Many XQuery queries require nested expressions. Unnesting them often introduces binary grouping. We introduce several algorithms implementing binary grouping and analyze their time and space complexity. Experiments demonstrate their performance.
Main Memory Implementations for Binary Grouping
Year: 2005 Journal: Technical Report Download: technical report, Library Server BibTeX: @TechReport{May:2005:MMITR, author = "Norman May and Guido Moerkotte", title = "Main Memory Implementations for Binary Grouping", institution = "University of Mannheim, Department for Mathematics and Computer Science", year = "2005", type = "Technical Report", number = "TR-2005-007" }Abstract: An increasing number of applications depend on efficient storage and analysis features for XML data. Hence, query optimization and efficient evaluation techniques for the emerging XQuery standard become more and more important. Many XQuery queries require nested expressions. Unnesting them often introduces binary grouping. We introduce several algorithms implementing binary grouping and analyze their time and space complexity. Experiments demonstrate their performance.
Cost-Sensitive Reordering of Navigational Primitives
Year: 2005 Journal: Proc. 24th ACM SIGMOD -SIGACT-SIGART Symposium on Principles of Database Systems, Baltimore, Maryland Download: paper BibTeX: @InProceedings{Kanne:2005:CSR, author = "Carl-Christian Kanne and Matthias Brantner and Guido Moerkotte", title = "Cost-Sensitive Reordering of Navigational Primitives", booktitle = "Proc. 24th ACM SIGMOD, Baltimore, Maryland", year = "2005", pages = "742--753", }Abstract: We present a method to evaluate path queries based on the novel concept of partial path instances. Our method (1) maximizes performance by means of sequential scans or asynchronous I/O, (2) does not require a special storage format, (3) relies on simple navigational primitives on trees, and (4) can be complemented by existing logical and physical optimizations such as duplicate elimination, duplicate prevention and path rewriting. We use a physical algebra which separates those navigation operations that require I/O from those that do not. All I/O operations necessary for the evaluation of a path are isolated in a single operator, which may employ efficient I/O scheduling strategies such as sequential scans or asynchronous I/O. Performance results for queries from the XMark benchmark show that reordering the navigation operations can increase performance up to a factor of four.
Full-fledged Algebraic XPath Processing in Natix
Year: 2005 Journal: Proc. 21. ICDE Conference, Tokyo, Japan Download: paper, prototype implementation BibTeX: @InProceedings{Brantner:2005:FAX, author = "Matthias Brantner and Carl-Christian Kanne and Sven Helmer and Guido Moerkotte", title = "Full-fledged Algebraic XPath Processing in Natix", booktitle = "Proc. 21. ICDE Conference, Tokyo, Japan", pages = "705--716", year = "2005", }Abstract: We present the first complete translation of XPath into an algebra, paving the way for a comprehensive, state-of-the-art XPath (and later on, XQuery) compiler based on algebraic optimization techniques. Our translation includes all XPath features such as nested expressions, position-based predicates and node-set functions. The translated algebraic expressions can be executed using the proven, scalable, iterator-based approach, as we demonstrate in form of a corresponding physical algebra in our native XML DBMS Natix. A first glance at performance results shows that even without further optimization of the expressions, we provide a competitive evaluation technique for XPath queries.
On the Optimal Ordering of Maps and Selections under Factorization
Year: 2005 Journal: Proc. ICDE Conference, Tokyo, Japan Download: paper BibTeX: @InProceedings{Neumann:2004:OOM, author = "Thomas Neumann, Sven Helmer and Guido Moerkotte", title = "On the Optimal Ordering of Maps and Selections under Factorization", booktitle = "Proc. ICDE Conference, Tokyo, Japan", pages = "490--501", year = "2005", }Abstract: The query optimizer of a database system is confronted with two aspects when handling user-defined functions (UDFs) in query predicates: the vast differences in evaluation costs between UDFs (and other functions) and multiple calls of the same (expensive) UDF. The former is dealt with by ordering the evaluation of the predicates optimally, the latter by identifying common subexpressions and thereby avoiding costly recomputation. Current approaches order $n$ predicates optimally (neglecting factorization) in $O(n \log n)$. Their result may deviate significantly from the optimal solution under factorization. We formalize the problem of finding optimal orderings under factorization and prove that it is NP-hard. Furthermore, we show how to improve on the run time of the brute-force algorithm (which computes all possible orderings) by presenting different enhanced algorithms. Although in the worst case these algorithms obviously still behave exponentially, our experiments demonstrate that for real-life examples their performance is much better.
:: 2004
Index Structures for Fuzzy Object-Oriented Database Systems.
Year: 2004 Journal: In Advances in Fuzzy Object-Oriented Databases (edited by Zongmin Ma), Idea Group Publishing, Hershey, 2004: 206-240 An Efficient Framework for Order Optimization
Year: 2004 Journal: Proc. ICDE Conference, Boston, USA Download: paper, technical report BibTeX: @InProceedings{Neumann:2004:EFO, author = "Thomas Neumann and Guido Moerkotte", title = "An Efficient Framework for Order Optimization", booktitle = "Proc. ICDE Conference, Boston, USA", pages = "461--472", year = "2004", }Abstract: Since the introduction of cost-based query optimization, the performance-critical role of interesting orders has been recognized. Some algebraic operators change interesting orders (e.g. sort and select), while others exploit interesting orders (e.g. merge join). The two operations performed by any query optimizer during plan generation are 1) computing the resulting order given an input order and an algebraic operator and 2) determining the compatibility between a given input order and the required order a given algebraic operator can beneficially exploit. Since these two operations are called millions of times during plan generation, they are highly performance-critical. The third crucial parameter is the space requirement for annotating every plan node with its output order.Lately, a powerful framework for reasoning about orders has been developed, which is based on functional dependencies. Within this framework, the current state-of-the-art algorithms for implementing the above operations both have a lower bound time requirement of Omega(n), where n is the number of functional dependencies involved. Further, the lower bound for the space requirement for every plan node is Omega(n).We improve these bounds by new algorithms with upper time bounds O(1). That is, our algorithms for both operations work in constant time during plan generation, after a one-time preparation step. Further, the upper bound for the space requirement for plan nodes is O(1) for our approach. Besides, our algorithm reduces the search space by detecting and ignoring irrelevant orderings. Experimental results with a full fledged query optimizer show that our approach significantly reduces the total time needed for plan generation. As a corollary of our experiments, it follows that the time spent for order processing is a non-neglectable part of plan generation.
Nested Queries and Quantifiers in an Ordered Context
Year: 2004 Journal: Proc. ICDE Conference, Boston, USA Download: paper BibTeX: @InProceedings{May:2004:NQQ, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Nested Queries and Quantifiers in an Ordered Context", booktitle = "Proc. ICDE Conference, Boston, USA", pages = "239--250", year = "2004", }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by unnesting.
Evaluating Lock-based Protocols for Cooperation on XML Documents
Year: 2004 Journal: SIGMOD Record Download: sigrec04_sym.pdf BibTeX: @Article{Helmer:2004:ELP, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Evaluating Lock-based Protocols for Cooperation on XML Documents", journal = "SIGMOD Record", volume = "33", number = "1", pages = "58--63", year = "2004", month = mar, }Abstract: We discuss four different core protocols for synchronizing access to and modi cations of XML document collections. These core protocols synchronize structure traversals and modi cations. They are meant to be integrated into a native XML base management System (XBMS) and are based on two phase locking. We also demonstrate the different degrees of cooperation that are possible with these protocols by various experimental results. Furthermore, we also discuss extensions of these core protocols to full- edged protocols. Further, we show how to achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type De nitions (DTDs).
XQuery Processing in Natix with an Emphasis on Join Ordering
Year: 2004 Journal: First International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2004) Download: paper BibTeX: @InProceedings{May:2004:XQN, author = "Norman May and Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "XQuery Processing in Natix with an Emphasis on Join Ordering", booktitle = "First International Workshop on XQuery Implementation, Experience and Perspectives (XIME-P 2004)", year = "2004", month = jun, pages = "49--54", }Abstract: We give an overview on how XQuery processing works in our native XML database system Natix. After a brief description of the query compiler we focus on the aspect of join ordering when generating query execution plans. Here we show that better plans can be found when extending the search space of the plan generator.
Engineering a New Abstraction Layer to Optimize the HEP Analysis Process
Year: 2004 Journal: IEEE, Transaction in Nuclear Sciences Journal BibTeX: @Article{Amaral:2004:ENA, author = "V. Amaral and S. Helmer and G. Moerkotte", title = "Engineering a New Abstraction Layer to Optimize the HEP Analysis Process", journal = "IEEE, Transaction in Nuclear Sciences Journal", volume = "51", number = "4", pages = "1441--1448", year = "2004", month = aug, }Abstract: We are observing a growing complexity in the interfaces of the analysis tools. As a consequence, end users have to master programming, understand complex frameworks and data storage details before achieving the physics goals. This reduces significantly the efficiency of the analysis process. In order to tackle this problem in analysis we propose to introduce a new abstraction layer between the end users and the current interfaces. We go further in our approach by introducing Pheasant QL, the first proposal for a declarative domain specific visual query language into this domain. Through our solution we can express complex decay queries by means of visual operators with reduced programming efforts, abstracting the storage and optimization details, and reducing the need to deal with the normal analysis framework and physical storage intricacies. In our communication we will describe the methodology we are following to design and implement the new abstraction layer. We will also describe our language in an informal manner in terms of syntax, semantics, and example queries.
PHEASANT: A PHysicist's EAsy ANalysis Tool
Year: 2004 Journal: Springer Verlag, Lecture Notes in Artificial Intelligence Download: SpringerLink BibTeX: @InProceedings{Amaral:2004:PHE , author = "V. Amaral and S. Helmer and G. Moerkotte", title = "PHEASANT: A PHysicist's EAsy ANalysis Tool", booktitle = "Springer Verlag, Lecture Notes in Artificial Inteligence", year = "2004", pages = "229--242" }Abstract: We present Pheasant, a framework for the analysis process in High Energy Physics (HEP). It uses PheasantQL, the first domain specific visual query language for HEP analysis. This allows us to express complex decay queries easily with no programming efforts, abstracting the storage and optimization details. The currently existing tools overburden users with many complex details and intricacies. A small scale assessment conducted with a prototype demonstrates the effectiveness of our technique.
Timestamp-based Protocols for Synchronizing Access on XML Documents
Year: 2004 Journal: Proc. DEXA Conference, Zaragoza, Spain BibTeX: @InProceedings{Helmer:2004:TBP, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Timestamp-based Protocols for Synchronizing Access on XML Documents", booktitle = "Proc. DEXA Conference, Zaragoza, Spain", pages = "230--234", year = "2004", }A Combined Framework for Grouping and Order Optimization
Year: 2004 Journal: Proc. VLDB Conference, Toronto, Canada Download: paper BibTeX: @InProceedings{Neumann:2004:CFG, author = "Thomas Neumann and Guido Moerkotte", title = "A Combined Framework for Grouping and Order Optimization", booktitle = "Proc. VLDB Conference, Toronto, Canada", pages = "960--971", year = "2004", }Abstract: Since the introduction of cost-based query optimization by Selinger et al. in their seminal paper, the performance-critical role of interesting orders has been recognized. Some algebraic operators change interesting orders (e.g. sort and select), while others exploit them (e.g. merge join). Likewise, Wang and Cherniack (VLDB 2003) showed that existing groupings should be exploited to avoid redundant grouping operations. Ideally, the reasoning about interesting orderings and groupings should be integrated into one framework. So far, no complete, correct, and efficient algorithm for ordering and grouping inference has been proposed. We fill this gap by proposing a general two-phase approach that efficiently integrates the reasoning about orderings and groupings. Our experimental results show that with a modest increase of the time and space requirements of the preprocessing phase both orderings and groupings can be handled at the same time. More importantly, there is no additional cost for the second phase during which the plan generator changes and exploits orderings and groupings by adding operators to subplans.
:: 2003
A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality
Year: 2003 Journal: VLDB Journal Download: technical report BibTeX: @Article{Helmer:2003:SFI, author = "Sven Helmer and Guido Moerkotte", title = "A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality", journal = "VLDB Journal", volume = "12", number = "3", pages = "244--261", year = "2003", }Abstract: We review and study the performance of four different index structures for indexing set-valued attributes designed to speed up set equality, subset and superset queries. All index structures are based on traditional techniques, namely signatures and inverted files. More specifically, we consider sequential signature files, signature trees, extendible signature hashing, and a B-tree based implementation of inverted lists. The latter is refined by a compression scheme in order to keep space requirements within acceptable bounds. The performance study is based on real implementations subjected to a benchmark accounting for different set sizes, domain sizes, and data distributions (uniform and skewed).
Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness
Year: 2003 Journal: Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), Berlin, Germany Download: vldb03.pdf BibTeX: @InProceedings{Helmer:2003:EOC, author = "Sven Helmer and Thomas Neumann and Guido Moerkotte", title = "Estimating the Output Cardinality of Partial Preaggregation with a Measure of Clusteredness", booktitle = "Proc. 29th Int. Conf. on Very Large Data Bases (VLDB), Berlin, Germany", pages = "656--667", year = "2003", }Abstract: We introduce a new parameter, the clusteredness of data, and show how it can be used for estimating the output cardinality of a partial preaggregation operator. This provides the query optimizer with an important piece of information for deciding whether the application of partial preaggregation is bene cial. Experimental results are very promising, due to the high accuracy of the cardinality estimation based on our measure of clusteredness.
Three Cases for Query Decorrelation in XQuery
Year: 2003 Journal: 1st Int. XML Database Symposium (XSYM), Berlin, Germany Download: unnesting_xmlsym03.pdf BibTeX: @InProceedings{May:2003:TCQ, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Three Cases for Query Decorrelation in XQuery", booktitle = "1st Int. XML Database Symposium (XSYM), Berlin, Germany", pages = "70--84", year = "2003", }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible our approach.
Lock-based Protocols for Cooperation on XML Documents
Year: 2003 Journal: Proc. 14th Int. Workshop on Database and Expert Systems Applications (at DEXA Conf.), Prague Download: technical report BibTeX: @Article{Helmer:2003:LPC, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Lock-based Protocols for Cooperation on XML Documents", journal = "Proc. 14th Int. Workshop on Database and Expert Systems Applications (at DEXA Conf.), Prague", pages = "230-234", year = "2003", }Abstract: The eXtensible Markup Language (XML) is well accepted in several different Web application areas. As soon as many users and applications work concurrently on the same collection of XML documents - e.g. on an XML database via a Web interface - isolating accesses and modifications of different transactions becomes an important issue. We discuss four different core protocols for synchronizing access to and modifications of XML document collections. These core protocols synchronize structure traversals and modifications. They are meant to be integrated into a native XML base management System (XBMS) and are based on two phase locking. We also demonstrate the different degrees of cooperation that are possible with these protocols by various experimental results. Furthermore, we also discuss extensions of these core protocols to full-fledged protocols. Further, we show how to achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type Definitions (DTDs).
Quantifiers in XQuery
Year: 2003 Journal: Proc. 4th Int. Conf. on Web Information Systems Engineering (WISE), Rome, Italy Download: unnesting_wise03.pdf BibTeX: @Article{May:2003:QIX, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Quantifiers in XQuery", journal = "Proc. 4th Int. Conf. on Web Information Systems Engineering (WISE), Rome, Italy", pages = "313--316", year = "2003", }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions containing quantifiers for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries formulated in XQuery. Measurements illustrate the performance gains possible by unnesting.
Nested Queries and Quantifiers in an Ordered Context
Year: 2003 Journal: Technical Report Download: technical report, Library Server BibTeX: @TechReport{May:2004:NQQTR, author = "Norman May and Sven Helmer and Guido Moerkotte", title = "Nested Queries and Quantifiers in an Ordered Context", institution = "University of Mannheim, Department for Mathematics and Computer Science", year = "2003", type = "Technical Report", number = "TR-2003-002" }Abstract: We present algebraic equivalences that allow to unnest nested algebraic expressions for order-preserving algebraic operators. We illustrate how these equivalences can be applied successfully to unnest nested queries given in the XQuery language. Measurements illustrate the performance gains possible by unnesting.
A Robust Scheme for Multilevel Extendible Hashing
Year: 2003 Journal: Proc. 18th International Symposium on Computer and Information Sciences (ISCIS), Antalya, Turkey Download: technical report BibTeX: @Article{Helmer:2003:RSM, author = "Sven Helmer and Thomas Neumann and Guido Moerkotte", title = "A Robust Scheme for Multilevel Extendible Hashing", journal = "Proc. 18th International Symposium on Computer and Information Sciences (ISCIS), Antalya, Turkey", pages = "218--225", year = "2003", }Abstract: Dynamic hashing, while surpassing other access methods for uniformly distributed data, usually performs badly for non-uniformly distributed data. We propose a robust scheme for multi-level extendible hashing allowing efficient processing of skewed data as well as uniformly distributed data. In order to test our access method we implemented it and compared it to several existing hashing schemes. The results of the experimental evaluation demonstrate the superiority of our approach in both index size and performance.
Evaluating Different Approaches for Indexing Fuzzy Sets
Year: 2003 Journal: Fuzzy Sets and Systems Download: link to electronic edition BibTeX: @Article{Helmer:2003:EDA, author = "Sven Helmer", title = "Evaluating Different Approaches for Indexing Fuzzy Sets", journal = "Fuzzy Sets and Systems", year = 2003, volume = 140, number = 1, pages = "167--182", month = "November" }Abstract: Providing efficient query processing in database systems is one step towards gaining acceptance of such systems by end users. We propose several techniques for indexing fuzzy sets in databases to improve the query evaluation performance. Three of the presented access methods are based on superimposed coding, while the fourth relies on inverted files. The efficiency of these techniques was evaluated experimentally. We present results from these experiments, which clearly show the superiority of the inverted files.
Book Review of: Recent issues on fuzzy databases
Year: 2003 Journal: Fuzzy Sets and Systems Download: link to electronic edition BibTeX: @Article{Helmer:2003:BRR, author = "Sven Helmer", title = "Book Review: Recent issues on fuzzy databases", journal = "Fuzzy Sets and Systems", year = 2003, volume = 140, number = 1, pages = "229--230", month = "November" }A Domain Specific Visual Query Language for the High Energy Physics Environment
Year: 2003 Journal: 3rd Workshop on Domain\-Specific Modeling, An OOPSLA 2003 Workshop, Anaheim, CA, USA Download: dsvql_oopsla2003.pdf BibTeX: @InProceedings{Amaral:2003:DSV, author = {V. Amaral and S. Helmer and G. Moerkotte}, editor = {Juha-Pekka Tolvanen and Jeff Gray and Matti Rossi}, title = "A Domain Specific Visual Query Language for the High Energy Physics Environment", booktitle = {3rd Workshop on Domain\-Specific Modeling, An OOPSLA 2003 Workshop, Anaheim, CA, USA}, publisher = {Jyv{\"a}skyl{\"a} University Printing House, Finland}, month = {October}, year = {2003}, isbn = {951-39-1582-4}, pages = {9-16} }Abstract: This paper presents Pheasant, a framework for analyzing data collected during high energy physics experiments. Our approach features a visual query language especially adapted to the needs of the physicists analyzing the data. We also provide interfaces to the currently applied tools, so that users can still do the same work as before, but on a higher level of abstraction. This helps users to concentrate on their actual job by hiding the intricacies. Although Pheasant is domain-specific, we introduce enough extensibility to enable it to meet future demands.
Indexstrukturen für XML
Year: 2003 Download: http://www.dpunkt.de/buch/3-89864-189-9.html BibTeX: @InBook{Helmer:2003:IXM, author = "S. Helmer and G. Moerkotte", title = "Web & Datenbanken", chapter = "Indexstrukturen für XML", pages = "217--248", publisher = "dpunkt-Verlag", year = "2003", }Abstract: table of contents
Constructing Optimal Bushy Trees Possibly Containing Cross Products for Order-Preserving Joins is in P
Year: 2003 Download: MA-03-12.pdf BibTeX: @TechReport{Moerkotte:2003:COB, author = "G. Moerkotte", title = "Constructing Optimal Bushy Trees Possibly Containing Cross Products for Order-Preserving Joins is in P", year = "2003", }Abstract: One of the main features of XQuery compared to traditional query languages like SQL, is that it preserves the input order - unless specified otherwise. As a consequence, order-preserving algebraic operators are needed to capture the semantics of XQuery correctly. One important algebraic operator is the order-preserving join. The order-preserving join is associative but, in contrast to the traditional join operator, not commutative. Since join ordering (i.e. finding the optimal execution plan for a given set of join operators) has been an important topic of query optimization for SQL, it is expected that it will also play a major role in optimizing XQuery. The search space for ordering traditional joins is exponential in size. Although the lack of commutativity reduces the search space for ordering order-preserving joins, we show that it is still exponential. This raises the question whether the join ordering problem is also NP-hard, as in the traditional setting. We answer this question by introducing the first polynomial algorithm that produces optimal bushy trees possibly containing cross products.
XML-Datenbanken und ihre Anwendung
Year: 2003 Journal: it - Information Technology Download: it0303_137.pdf BibTeX: @Article{Helmer:2003:XDA, author = {Sven Helmer and Carl-Christian Kanne and Guido Moerkotte}, title = {XML-Datenbanksysteme und ihre Anwendung.}, journal = {it - Information Technology}, volume = {45}, number = {3}, year = {2003}, pages = {137-142}, }Abstract: Due to its fast proliferation, the amount of XML data is steadily increasing and XML-based applications become more and more complex. This requires effective and efficient methods to manage large amounts of persistent XML. We identify the most common requirements for persistence solutions and discuss alternative approaches to store XML. Our conclusion will be that only database management systems specifically designed for XML fulfill the requirements. Last, we give an overview of one such system called Natix.
A Visual Query Language for HEP Analysis
Year: 2003 Journal: IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA Download: nss1-2003.pdf BibTeX: @Article{Amaral:2003:VQL, author = {V. Amaral and S. Helmer and G. Moerkotte}, title = "A Visual Query Language for HEP Analysis", journal = {IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA}, year = {2003}, }Abstract: Pheasant is the first proposal for a declarative domain specific visual query language for HEP data analysis. It has been designed based on our experience dealing with Hera-B event data and query patterns. Its main goal is to allow the physicist to describe the decay selection queries by means of visual operators, to be run against the experiments' existing storage bases and analysis frameworks. Our visual language aims to be a simple-to-use tool with which a user can express complex decay queries in an easy way with reduced programming efforts. Indeed, the user does not have to deal with intricacies like physical storage details. In our communication we will describe how we determined the visual primitives by looking at the query patterns. We will also describe our language in an informal manner in terms of syntax, semantics, and example queries.
Designing and Implementing a New Abstraction Layer to Optimize the HEP Analysis Process
Year: 2003 Journal: IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA Download: nss2-2003.pdf BibTeX: @Article{Amaral:2003:DIN, author = {V. Amaral and S. Helmer and G. Moerkotte}, title = "Designing and Implementing a New Abstraction Layer to Optimize the HEP Analysis Process", journal = {IEEE Nuclear Sciences Symposium NSS, Portland, Oregon, USA}, year = {2003}, }Abstract: Presently, in the new HEP experiments we are observing a growing complexity in the interfaces of the analysis tools. As a consequence, end users have to master programming, understand complex frameworks and data storage details before achieving the physics goals. This reduces significantly the efficiency of the analysis process. In order to tackle this problem we propose introducing a new abstraction layer between the end users and the current interfaces. We go further in our approach by proposing to perform the mass event selection by the usage of a visual language that expresses the physics analysis queries in a declarative way. Through our solution we can express complex decay queries easily with no programming efforts, abstracting the storage and optimization details, and reducing the need to deal with the normal framework intricacies. We describe the methodology used designing and implementing the new abstraction layers.
:: 2002
Anatomy of a native XML base management system
Year: 2002 Journal: VLDB Journal Download: natixvldbj02.ps, technical report BibTeX: @Article{Fiebig:2002:ANX, author = "Thorsten Fiebig and Sven Helmer and Carl-Christian Kanne and Guido Moerkotte and Julia Neumann and Robert Schiele and Till Westmann", title = "Anatomy of a native XML base management system", journal = "VLDB Journal", volume = "11", number = "4", pages = "292--314", year = "2002", }Abstract: Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML base management systems. In this paper we give a tour of Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we illustrate how almost every component in a database system is affected in terms of adequacy and performance. We show how to design and optimize areas such as storage, transaction management - comprising recovery and multi-user synchronization - as well as query processing for XML.
Incorporating XSL Processing into Database Engines
Year: 2002 Journal: Proc. 28th Int. Conf. on Very Large Data Bases (VLDB), Hong Kong, China Download: vldb02xslt.pdf BibTeX: @Article{Moerkotte:2002:IXP, author = "Guido Moerkotte", title = "Incorporating XSL Processing into Database Engines", journal = "Proc. 28th Int. Conf. on Very Large Data Bases (VLDB), Hong Kong, China", pages = "107--118", year = "2002", }Abstract: The two observations that 1) many XML documents are stored in a database or generated from data stored in a database and 2) processing these documents with XSL stylesheet processors is an important, often recurring task justify a closer look at the current situation. Typically, the XML document is retrieved or constructed from the database, exported, parsed, and then processed by a special XSL processor. This cumbersome process clearly sets the goal to incorporate XSL stylesheet processing into the database engine. We describe one way to reach this goal by translating XSL stylesheets into algebraic expressions. Further, we present algorithms to optimize the template rule selection process and the algebraic expression resulting from the translation. Along the way, we present several undecidability results hinting at the complexity of the problem on hand
Optimized Translation of XPath into Algebraic Expressions Parameterized by Programs Containing Navigational Primitives
Year: 2002 Journal: Proc. 3rd Int. Conf. on Web Information Systems Engineering (WISE), Singapore Download: wise2002.ps, technical report BibTeX: @Article{Helmer:2002:OTX, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Optimized Translation of XPath into Algebraic Expressions Parameterized by Programs Containing Navigational Primitives", journal = "Proc. 3rd Int. Conf. on Web Information Systems Engineering (WISE), Singapore", pages = "215--224", year = "2002", }Abstract: We propose a new approach for the efficient evaluation of XPath expressions. This is important, since XPath is not only used as a simple, stand-alone query language, but is also an essential ingredient of XQuery and XSLT. The main idea of our approach is to translate XPath into algebraic expressions parameterized with programs. These programs are mainly built from navigational primitives like accessing the first child or the next sibling. The goals of the approach are 1. to enable pipelined evaluation, 2. to avoid producing duplicate (intermediate) result nodes, 3. to visit as few document nodes as possible, and 4. to avoid visiting nodes more than once. This improves the existing approaches, because our method is highly efficient.
Compiling Away Set Containment and Intersection Joins
Year: 2002 Download: TR-02-004.pdf BibTeX: @TechReport{Helmer:2002:CAS, author = "Sven Helmer and Guido Moerkotte", title = "Compiling Away Set Containment and Intersection Joins", institution = "University of Mannheim", year = "2002" }Abstract: We investigate the effect of query rewriting on joins involving set-valued attributes in object-relational database management systems. We show that by unnesting set-valued attributes (that are stored in an internal nested representation) prior to the actual set containment or intersection join we can improve the performance of query evaluation by an order of magnitude. By giving example query evaluation plans we show the increased possibilities for the query optimizer.
Natix: A Technology Overview
Year: 2002 Journal: Proc. Web, Web-Services, and Database Systems, NODe 2002 Web and Database-Related Workshops Download: natixnod2002.ps BibTeX: @Article{Fiebig:2002:NTO, author = "Thorsten Fiebig and Sven Helmer and Carl-Christian Kanne and Guido Moerkotte and Julia Neumann and Robert Schiele and Till Westmann", title = "Natix: A Technology Overview", journal = "Proc. Web, Web-Services, and Database Systems, NODe 2002 Web and Database-Related Workshops" pages = "12--33", year = "2002" }Abstract: Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML base management systems. In this paper we review Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we indicate how almost every component in a database system is affected in terms of adequacy and performance. We show what kind of problems have to be tackled when designing and optimizing areas such as storage, transaction management comprising recovery and multi-user synchronization as well as query processing for XML.
Early Grouping Gets the Skew
Year: 2002 Download: TR-02-009.pdf BibTeX: @TechReport{Helmer:2002:EGG, author = "Sven Helmer and Thomas Neumann and Guido Moerkotte", title = "Early Grouping Gets the Skew", institution = "University of Mannheim", year = "2002" }Abstract: We propose a new algorithm for external grouping with large results. Our approach handles skewed data gracefully and lowers the amount of random IO on disk considerably. Contrary to existing grouping algorithms, our new algorithm does not require the optimizer to employ complicated or error-prone procedures adjusting the parameters prior to query plan execution. We implemented several variants of our algorithm as well as the most commonly used algorithms for grouping and carried out extensive experiments on both synthetic and real data. The results of these experiments reveal the dominance of our approach. In case of heavily skewed data we outperform the other algorithms by a factor of two.
:: 2001
Algebraic XML Construction in Natix
Year: 2001 Journal: Proc. 2nd Int. Conf. on Web Information Systems Engineering (WISE), Kyoto, Japan Download: wise01.ps BibTeX: @Article{Fiebig:2001:AXC, author = "Thorsten Fiebig and Guido Moerkotte", title = "Algebraic XML Construction in Natix", journal = "Proc. 2nd Int. Conf. on Web Information Systems Engineering (WISE), Kyoto, Japan", pages = "212--221", year = "2001", }Algebraic XML Construction and its Optimization in Natix
Year: 2001 Journal: WWW Journal 4(3) Download: www02.ps BibTeX: @Article{Fiebig:2001:AXO, author = "Thorsten Fiebig and Guido Moerkotte", title = "Algebraic XML Construction and its Optimization in Natix", journal = "WWW Journal 4(3)", pages = "167-187", year = "2001", }Abstract: While using an algebra that acts on sets of variable bindings for evaluating XML queries, the problem of constructing XML from these bindings arises. One approach is to define a powerful operator that is able to perform a complex construction of a representation of the XML result document. The drawback is that such an operator in its generality is hard to implement and disables algebraic optimization since it has to be executed last in the plan. Therefore we suggest to construct XML documents by special query execution plans called construction plans built from simple, easy to implement and efficient operators. The paper proposes four simple algebraic operators needed for XML document construction. Further, we introduce an optimizing translation algorithm of construction clauses into algebraic expressions and briefly point out algebraic optimizations enabled by our approach.
Isolation in XML Bases
Year: 2001 Download: TR-01-015.pdf BibTeX: @TechReport{Helmer:2001:IXB, author = "Sven Helmer and Carl-Christian Kanne and Guido Moerkotte", title = "Isolation in XML Bases", institution = "University of Mannheim", year = "2001" }Abstract: The eXtensible Markup Language (XML) is well accepted in many different application areas. As a consequence, there is an increasing need for persistently storing XML documents. As soon as many users and applications work concurrently on the same collection of XML documents - i.e. an XML base - isolating accesses and modifications of different transactions becomes an important issue. We discuss six different core protocols for synchronizing access to and modifications of XML document collections. These core protocols synchronize structure traversals and modifications. They are meant to be integrated into a native XML base management System (XBMS). Four of the six core protocols are based on two phase locking, one uses time stamps, and the last one uses a novel dynamic commit-ordering approach. The latter two protocols achieve a higher degree of concurrency by a novel implicit representation of multiple versions. We also discuss extensions of these core protocols to full-fledged protocols. Further, we show how the two phase locking based protocols can achieve a higher degree of concurrency by exploiting the semantics expressed in Document Type Definitions (DTDs).
Indexing Fuzzy Data
Year: 2001 Journal: Proc. Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., Vancouver Download: paper BibTeX: @Article{Helmer:2001:IFD, author = "Sven Helmer", title = "Indexing Fuzzy Data", journal = "Proc. Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., Vancouver", pages = "2120--2125", year = "2001", }Abstract: Providing efficient query processing in database sys- tems is one step in gaining acceptance of such systems by end users. We propose several techniques for indexing fuzzy sets in databases to improve the query evalu- ation performance. Three of the presented access methods are based on superimposed coding, while the fourth relies on inverted files. The efficiency of these techniques was evaluated experimentally. We present results from these experiments, which clearly show the superiority of the inverted files.
Natix - ein natives XML-DBMS
Year: 2001 Journal: Datenbank Spektrum Download: dbspektrum01.ps BibTeX: @Article{Fiebig:2001:NNX, author = "Thorsten Fiebig and Carl-Christian Kanne and Guido Moerkotte", title = "Natix - ein natives XML-DBMS", journal = "Datenbank Spektrum" number = "1" pages = "5-13" year = "2001", }Abstract: Die rapide Zunahme an XML-Dokumenten verlangt nach einer systematischen Verwaltung von XML-Dokument-Kollektionen. Das in diesem Aufsatz beschriebene Datenbankmanagementsystem Natix ist speziell für die effiziente Verwaltung von XML-Dokumenten entwickelt worden. Wir beschreiben die Architektur von Natix und ausgewählte Module für die Speicherung und Anfrageauswertung.
Studies for Optimisation of Data Analysis Queries for HEP Using HERA-B Commissioning Data
Year: 2001 Journal: Int. Conf. on Computing in High Energy and Nuclear Physics (CHEP'01), Beijin Download: paper BibTeX: @Article{Amaral:2001:SOD, author = "Vasco Amaral and Guido Moerkotte and Antonio Amorim and Sven Helmer", title = "Indexing Fuzzy Data", journal = "Int. Conf. on Computing in High Energy and Nuclear Physics (CHEP'01), Beijing" pages = "154-155" year = "2001", }Abstract: In this paper we present an overview of the ongoing studies to build up a framework that supports the analysis after the reprocessing phase. This framework aims to develop a standard data query language for the HEP community. The related studies have been considering the relational database model as possible approach opposed to the object model. Several optimizing and tunning techniques are being used in technologies like DB2, Oracle and Root, that are simultaneously being evaluated. The experience obtained can be seen as a valuable testbed for the future LHC and simultaneously as interesting input for the development of the GRID.
:: 2000
Efficient Storage of XML Data
Year: 2000 Journal: ICDE, San Diego, USA Download: xmlvldb99.ps BibTeX: @Article{Kanne:2000:ESX, author = "Carl-Christian Kanne and Guido Moerkotte", title = "Efficient Storage of XML Data", journal = "ICDE, San Diego, USA", pages = "198--", year = "2000", }Abstract: We introduce Natix, an efficient, native repository for storing, retrieving and managing tree-structured large objects, preferably XML documents. In contrast to traditional large object (LOB) managers, we do not split at arbitrary byte positions but take the semantics of the underlying tree structure of XML documents into account. Our parameterizable split algorithm dynamically maintains physical records of size smaller than a page which contain sets of connected tree nodes. This not only improves efficiency by clustering subtrees but also facilitates their compact representation. Existing approaches to store XML documents either use flat files or map every single tree node onto a separate physical record. The increased flexibility of our approach results in higher efficiency. Performance measurements validate this claim.
Evaluating Queries on Structure with eXtended Access Support Relations
Year: 2000 Download: webdb00.ps Abstract: There are three common design decisions taken by today's search engines. First, they do not replicate the data found on the Web. Second, they rely on full-text indexes instead. Third, they do not support the querying of document structure. The main reason for the latter is that HTML's ability to express semantics with syntactic structure is very limited. This is di erent for XML since it allows for self-describing data. Due to its flexibility by inventing arbitrary new element and attribute names, XML allows to encode semantics within syntax. The consequence is that search engines for XML should support the querying of structure. In our current work on search engines for XML data on the Web, we want to keep the rst two design decisions of traditional search engines but modify the last one according to the new requirements implied by the necessity to query structure. Since our search engine accepts queries with structural information, a full-text index does not su ce any longer. What is needed is a scalable i ndex structure that allows to answer queries over the structure of XML documents. One possible index structure called eXtended Access Support Relation (XASR) is introduced. Further, we report on a search engine for XML data called Mumpits. Due to its prototypical character, we intentionally kept the design and implementation of Mumpits very simple. Its design is centered around a single XASR and its implementation heavily builds on a commercial relational database management system.
The Implementation and Performance of Compressed Databases
Year: 2000 Download: sigrec00.pdf, technical report Abstract: In this paper, we show how compression can be integrated into a relational database system. Specifically, we describe how the storage manager, the query execution engine, and the query optimizer of a database system can be extended to deal with compressed data. Our main result is that compression can significantly improve the response time of queries if very light-weight compression techniques are used. We will present such light-weight compression techniques and give the results of running the TPC-D benchmark on a so compressed database and a non-compressed database using the AODB database system, an experimental database system that was developed at the Universities of Mannheim and Passau. Our benchmark results demonstrate that compression indeed offers high performance gains (up to 50%) for IOintensive queries and moderate gains for CPU-intensive queries. Compression can, however, also increase the running time of certain update operations. In all, we recommend to extend today s database systems with light weight compression techniques and to make extensive use of this feature.
Optimization and Evaluation of Disjunctive Queries
Year: 2000 Download: DisjQueries_IEEE2000.pdf, technical report Abstract: It is striking that the optimization of disjunctive queries i.e., those which contain at least one or-connective in the query predicate has been vastly neglected in the literature, as well as in commercial systems. In this paper, we propose a novel technique, called bypass processing, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the true-stream with tuples satisfying the selection (join) predicate and the false-stream with tuples not satisfying the corresponding predicate. Splitting the tuple streams in this way enables us to "bypass" costly predicates whenever the "fate" of the corresponding tuple (stream) can be determined without evaluating this predicate. In the paper, we show how to systematically generate bypass evaluation plans utilizing a bottom-up building block approach. We show that our evaluation technique allows to incorporate the standard SQL semantics of null values. For this, we devise two different approaches: One is based on explicitly incorporating three-valued logic into the evaluation plans; the other one relies on two-valued logic by "moving" all negations to atomic conditions of the selection predicate. We describe how to extend an iterator-based query engine to support bypass evaluation with little extra overhead. This query engine was used to quantitatively evaluate the bypass evaluation plans against the traditional evaluation techniques utilizing a CNF- or DNF-based query predicate.
YAXQL: A Powerful and Web-Aware Query Language Supporting Query Reuse and Data Integration
Year: 2000 Download: yaxql.ps Abstract: Since XML seems to be the next wave on the web, several query languages for XML have been proposed. Unfortunately, none of these proposals comes even close to fulfill the requirements for such a query language. We review the requirements for a query language for XML and propose a new query language, YAXQL, which fulfills these requirements.
:: 1999
Efficient Maintenance of Materialized Mediated Views
Year: 1999 Download: sigmod95.ps Abstract: Integrating data and knowledge from multiple heterogeneous sources - like databases, knowledge bases or specific software packages - is often required for answering certain queries. Recently, a powerful framework for defining mediated view spanning multiple knowledge bases by a set of constrained rules (cf. the work of Kanellakis et al.) was proposed. Within this paper, we investigate the materialization of these views by unfolding the view definition and the efficient maintenance of the resulting materialized mediated view in case of updates. Thereby, we consider two kinds of updates: updates to the view and updates to the underlying sources. For each of these two cases several efficient algorithms maintaining materialized mediated views are given. We improve on previous algorithms like the DRed algorithm and introduce a new fixpoint operator Wp which - opposed to the standard fixpoint operator Tp - allows to correctly capture the update's semantics without any recomputation of the materialized view.
Index structures for efficiently accessing fuzzy data including cost models and measurements
Year: 1999 Journal: Fuzzy Sets and Systems Download: