Papers by Philippe Michiels
How to recognize different kinds of tree patterns from quite a long way away

IEEE Internet Computing, 2021
Digital twins have generated a lot of hype recently, but questions remain about what the technolo... more Digital twins have generated a lot of hype recently, but questions remain about what the technology actually means and how one can be built for smart cities. There is a lack of unified models and frameworks for data fusions that link the physical and virtual data exchange. This can undermine the uptake of digital twin technology by cities that are unable to tackle urban problems with advanced data-driven solutions. The T-Cell framework developed by the DUET project acts as a container for models, data, and simulations that interact dynamically in a common environment and provide useful insights for smart city decision makers. Dynamic correspondence that links the architecture with models and data makes it possible to monitor and synchronize the state and behavior of the digital twin with the physical environment being mirrored. Individual models are integrated through APIs to form a cloud of models that can be called upon to perform various what-if analyses related to traffic, air quality, or noise pollution. The framework is currently being tested with citizens in three locations in Europe, but it is easily replicable so that any city, no matter its size, can leverage the power of digital twins to achieve its policy goals.
In the relational model it has been shown that the flat relational algebra has the same expressiv... more In the relational model it has been shown that the flat relational algebra has the same expressive power as the nested relational algebra, as far as queries over flat relations and with flat results are concerned [11]. Hence, for each query that uses the nested relational model and that, with a flat table as input always has a flat table as output, there exists an equivalent flat query that only uses the flat relational model. In [12] a very direct proof is given of this fact using a simulation technique. In analogy, we study a related flat- ...
Lecture Notes in Computer Science, 2004
We present a sound and complete rule set for determining whether sorting and duplicate removal op... more We present a sound and complete rule set for determining whether sorting and duplicate removal operations in the query plan of XPath expressions are unnecessary. Additionally we define a deterministic finite automaton that illustrates how these rules can be translated into an efficient algorithm. This work is an important first step in the understanding and tackling of XPath/XQuery optimization problems that are related to ordering and duplicate removal.
the relational model it has been shown that the flat rela- tional algebra has the same expressive... more the relational model it has been shown that the flat rela- tional algebra has the same expressive power as the nested relational algebra, as far as queries over flat relations and with flat results are concerned (6). Hence, for each query that uses the nested relational model and that, with a flat table as input always has a flat table as output, there exists an equivalent flat query that only uses the flat relational model. In analogy, we study a related flat-flat problem for XQuery: for each expression containing operations that con- struct new nodes and whose XML result contains only orig- inal nodes, there exists an equivalent "flat" expression in XQuery that does not construct new nodes.
In this article we propose algorithms for implementing the axes for element nodes in XPath given ... more In this article we propose algorithms for implementing the axes for element nodes in XPath given a DOM-like representation of the document. Each algorithm assumes an input list that is sorted in document order and duplicate-free and returns a sorted and duplicate-free list of the result of following a certain axis from the nodes in the input list. The time complexity of all presented algorithms is at most O(l +m) where l is the size of the input list andm the size of the output list. This improves upon results in [4] where also algorithms with linear time complexity are presented, but these are linear in the size of the entire document whereas our algorithms are linear in the size of the intermediate results which are often much smaller.
Webdb, 2005
In the relational model it has been shown that the flat relational algebra has the same expressiv... more In the relational model it has been shown that the flat relational algebra has the same expressive power as the nested relational algebra, as far as queries over flat relations and with flat results are concerned [6]. Hence, for each query that uses the nested relational model and that, with a flat table as input always has a flat table as output, there exists an equivalent flat query that only uses the flat relational model. In analogy, we study a related flat-flat problem for XQuery: for each expression containing operations that construct new nodes and whose XML result contains only original nodes, there exists an equivalent "flat" expression in XQuery that does not construct new nodes.
In this article we propose algorithms for implementing the axes for element nodes in XPath given ... more In this article we propose algorithms for implementing the axes for element nodes in XPath given a DOM-like representation of the document. Each algorithm assumes an input list that is sorted in document order and duplicate-free and returns a sorted and duplicate-free list of the result of following a certain axis from the nodes in the input list. The time complexity of all presented algorithms is at most O(l + m) where l is the size of the input list and m the size of the output list. This improves upon results in [4] where also algorithms with linear time complexity are presented, but these are linear in the size of the entire document whereas our algorithms are linear in the size of the intermediate results which are often much smaller.
This paper describes XStream, a Turing-complete programming language which allows the programmer ... more This paper describes XStream, a Turing-complete programming language which allows the programmer to write XML transformations in a functional tree-processing style and have them evaluated in a streaming way: the output is produced incrementally while the input is still being parsed. The programmer does not need to care explicitly about buffering. We introduce the language, describe some techniques used in the implementation and present some performance results.
In this technical report we propose algorithms for implementing the axes for element nodes in XPa... more In this technical report we propose algorithms for implementing the axes for element nodes in XPath given a DOM-like representation of the document. First, we construct algorithms for evaluating simple step expressions, withoout any (positional) predicates. The time complexity of these algorithms is at most O(l + m) where l is the size of the input list and m,the
Very Large Data Bases, 2003
Abstract: We take a closer look at the optimization problems that are associated with the XQuery ... more Abstract: We take a closer look at the optimization problems that are associated with the XQuery language. We discuss the research that has been done and some open problems along with potential solutions.

2007 IEEE 23rd International Conference on Data Engineering, 2007
To address the needs of data intensive XML applications, a number of efficient tree pattern algor... more To address the needs of data intensive XML applications, a number of efficient tree pattern algorithms have been proposed. Still, most XQuery compilers do not support those algorithms. This is due in part to the lack of support for tree patterns in XML algebras, but also because deciding which part of a query plan should be evaluated as a tree pattern is a hard problem. In this paper, we extend a tuple algebra for XQuery with a tree pattern operator, and present rewritings suitable to introduce that operator in query plans. We demonstrate the robustness of the proposed rewritings under syntactic variations commonly found in queries. The proposed tree pattern operator can be implemented using popular algorithms such as Twig joins and Staircase joins. Our experiments yield useful information to decide which algorithm should be used in a given plan.
Towards micro-benchmarking XQuery
XPath 2.0 path expressions can observe and preserve the document order and identity of XML values... more XPath 2.0 path expressions can observe and preserve the document order and identity of XML values in a document. In particular, their semantics requires that the complete result and the result of each individual step in a path expression be in document order and duplicate-free. Implementations of this semantics often guarantee correctness by inserting explicit operations that sort and remove duplicates after each step. Such operations, however, can be redundant, because an intermediate result may already be ...
Automata for avoiding unnecessary ordering operations in XPath implementations
XPath 2.0 path expressions can observe and preserve the document order and identity of XML values... more XPath 2.0 path expressions can observe and preserve the document order and identity of XML values in a document. In particular, their semantics requires that the complete result and the result of each individual step in a path expression be in document order and duplicate-free. Implementations of this semantics often guarantee correctness by inserting explicit operations that sort and remove duplicates after each step. Such operations, however, can be redundant, because an intermediate result may already be ...
Lecture Notes in Computer Science, 2005
XQuery is a feature-rich language with complex semantics. This makes it hard to come up with a be... more XQuery is a feature-rich language with complex semantics. This makes it hard to come up with a benchmark suite which covers all performance-critical features of the language, and at the same time allows one to individually validate XQuery evaluation techniques. This paper presents MemBeR, a micro-benchmark repository, allowing the evaluation of an XQuery implementation with respect to precise evaluation techniques. We take the view that a fixed set of queries is probably insufficient to allow testing for various performance aspects, thus, the users of the repository must be able to add new data sets and/or queries for specific performance assessment tasks. We present our methodology for constructing the micro-benchmark repository, and illustrate with some sample micro-benchmarks.

2007 IEEE 23rd International Conference on Data Engineering, 2007
Existing work on XML query evaluation has either focused on algebraic optimization techniques sui... more Existing work on XML query evaluation has either focused on algebraic optimization techniques suitable for XML databases, or on algorithms to efficiently process XML messages represented as a stream of parsing events. In practice, complex applications often must handle both. In this paper, we develop a physical algebra that combines streaming operators with other standard relational and XML operators. Our physical model includes marked XML streams, which permit efficient XPath evaluation, but can only be consumed once. This constraint restricts the use of streaming operators to fragments of a query plan that only access data using depth-first traversal. We develop static analysis techniques to decide which fragment of a plan can be streamed. Our experiments demonstrate the benefits of blending streaming with other evaluation techniques. Serialize Streamed Streamed Sequence Export Attribute[person] MapToItem { Element[bid] } Export IN#uid LOuterJoin_Hash { IN#uid = IN#pid } MapIndex[i] MapFromItem{[dot: IN]} Open("pdb.xml") Load Map{[uid: ]} IN#dot TreePattern_TwigJoin [.//person{p}/@id{pid}] GroupBy [i][person]{IN#p} TreePattern_TwigJoin[./profile] MapFromItem{[dot: IN]} IN#person TreeProject_Token[.//bidder/@uid] TreeJoin_Token[.//bidder] TreeJoin_Token[./@uid]
We present a sound and complete rule set for determining whether sorting by document order and du... more We present a sound and complete rule set for determining whether sorting by document order and duplicate removal operations in the query plan of XPath expressions are unnecessary. Additionally we define a deterministic automaton that illustrates how these rules can be translated into an efficient algorithm. This work is an important first step in the understanding and tackling of XPath/XQuery optimization problems that are related to ordering and duplicate removal.
Information Systems, 2008
A substantial part of the database research field focusses on optimizing XQuery evaluation. Howev... more A substantial part of the database research field focusses on optimizing XQuery evaluation. However, there is a lack of tools allowing to easily compare different implementations of isolated language features. This implies that there is no overview of which engines perform best at certain XQuery aspects, which in turn makes it hard to pick a reference platform for an objective comparison. This paper is the first to give an overview of a large subset of the open source XQuery implementations in terms of performance. Several specific XQuery features are tested for each engine on the same hardware to give an impression of the strengths and weaknesses of that implementation. This paper aims at guiding implementors in benchmarking and improving their products.
Uploads
Papers by Philippe Michiels