Reformulating Aggregate Queries Using Views

Michael Genesereth

Outline

Theoretical Computer Science

Reformulating Aggregate Queries Using Views

Michael Genesereth

2013, Symposium on Abstraction, Reformulation and Approximation

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

André Pires

Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N -dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, crosstabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N -space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N -dimensional cube. Super-aggregates are computed by aggregating the N -cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.

downloadDownload free PDF View PDFchevron_right

Advanced grouping and aggregation for data integration

Kai-Uwe Sattler

Proceedings of the tenth international conference on Information and knowledge management, 2001

New applications from the areas of analytical data processing and data integration require powerful features to condense and reconcile available data. Object-relational and other data management systems available today provide only limited concepts to deal with these requirements. The general concept of grouping and aggregation appears to be a fitting paradigm for a number of the mentioned issues, but in its common form of equality based groups and restricted aggregate functions a number of problems remain unsolved. Various extensions to this concept have been introduced over the last years regarding user-defined functions for aggregation and grouping. Especially, existing extensions to the grouping operation like simple derivations of group-by values do not meet the requirements of data integration applications. We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications and illustrate the approach by introducing new concepts for similarity-based duplicate detection and elimination.

downloadDownload free PDF View PDFchevron_right

Rewriting General Conjunctive Queries Using Views

Rodney Topor, Junhu Wang

Australasian Database Conference, 2002

The problem of rewriting queries using views has important applications in data integration, query optimization, and physical data independence maintenance. Previous researchers have proposed rewriting algorithms for queries and views that are Datalog programs or conjunctive queries with arithmetic comparisons such as built-in predicates, using views. Our method also has advantages over previous algorithms when there are no built-in predicates

downloadDownload free PDF View PDFchevron_right

Decomposition and Sharing User-defined Aggregation: from Theory to Practice

Farouk Toumani

2018

We study the problems of decomposing and sharing user-defined aggregate functions in distributed and parallel computing. Aggre-gation usually needs to satisfy the distributive property to compute in parallel, and to leverage optimization in multidimensional data analysis and conjunctive query with aggregation. However, this property is too restricted to allow more aggregation to benefit from these advantages. We propose for user-defined aggregation functions a formal framework to relax the previous condition, and we map this framework to the MRC, an efficient computation model in MapReduce, to automatically generate efficient partial aggrega-tion functions. Moreover, we identify the complete conditions for sharing the result of practical user-defined aggregation without scanning base data, and propose a hybrid solution, the symbolic index, pull-up rules, to optimize the sharing process.

downloadDownload free PDF View PDFchevron_right

Aggregate query answering under uncertain schema mappings

Maria Vanina Martinez

University of Maryland …, 2008

downloadDownload free PDF View PDFchevron_right

Eager aggregation and lazy aggregation

Paul Larson

… of the International Conference on Very …, 1995

Efficient processing of aggregation queries is essential for decision support applications. This paper describes a class of query transformations, called eager aggregation and laty aggregation, that allows a query optimizer to move group-by operations up and down the query tree. Eager aggregation partially pushes a groupby past a join. After a group-by is partially pushed down, we still need to perform the original groupby in the upper query block. Eager aggregation reduces the number of input rows to the join and thus may result in a better overall plan. The reverse transformation, lazy aggregation, pulls a group-by above a join and combines two group-by operations into one. This transformation is typically of interest when an aggregation query references a grouped view (a view containing a groupby). Experimental results show that the technique is very beneficial for queries in the TPC-D benchmark.

downloadDownload free PDF View PDFchevron_right

y-Constrained multi-dimensional aggregation

Johann Gamper

2010

The SQL:2003 standard introduced window functions to enhance the analytical processing capabilities of SQL. The key concept of window functions is to sort the input relation and to ordering does not exist, though, and hence expensive join-based solutions are required. In this paper we introduce y-constrained multi-dimensional aggregation (y-MDA), which supports multi-dimensional OLAP queries with aggregation groups defined by inequalities. y-MDA is not based on an ordering of the data relation. Instead, the tuples that shall be considered for computing an aggregate value can be determined by a general y condition. This facilitates the formulation of complex queries, such as multi-dimensional cumulative aggregates, which are difficult to express in SQL because no appropriate ordering exists. We present algebraic transformation rules that demonstrate how the y-MDA interacts with other operators of a multi-set algebra. Various techniques for achieving an efficient evaluation of the y-M...

downloadDownload free PDF View PDFchevron_right

Aggregation in functional query languages

Juan Carlos Nieves

Journal of Functional and Logic …, 2004

We consider the problem of improving the computational efficiency of a functional query language. Our focus is on aggregate operations which have proven to be of practical interest in database querying. Since aggregate operations are typically non-monotonic in nature, recursive programs making use of aggregate operations must be suitably restricted in order that they have a well-defined meaning. In a recent paper we showed that partial-order clauses provide a well-structured means of formulating such queries. The present paper extends earlier work in exploring the notion of declarative pruning. By "declarative pruning" we mean that the programmer can specify declarative information about certain functions in the program without altering the meanings of these functions. Using this information, our proposed execution model provides for more efficient program execution. Essentially we require that certain domains must be totally-ordered (as opposed to being partially-ordered). Given this information, we show how the search space of solutions can be pruned efficiently. The paper presents examples illustrating the language and its computation model, and also presents a formal operational semantics. * This is a revised and expanded version of the paper

downloadDownload free PDF View PDFchevron_right

Datacube: a relational aggregation operator generalizing group-by

Adam Bosworth

1996

downloadDownload free PDF View PDFchevron_right

Query reformulation with constraints

alin deutsch

Sigmod Record, 2006

Let Σ 1 , Σ 2 be two schemas, which may overlap, C be a set of constraints on the joint schema Σ 1 ∪ Σ 2 , and q 1 be a Σ 1 -query. An (equivalent) reformulation of q 1 in the presence of C is a Σ 2 -query, q 2 , such that q 2 gives the same answers as q 1 on any Σ 1 ∪ Σ 2 -database instance that satisfies C. In general, there may exist multiple such reformulations and choosing among them may require, for example, a cost model.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (4)

Big Picture score(M, S) score(M, S) Man of Steel 16600 Twilight 2550 views(M, S) views(M, S) Man of Steel 2000 Twilight 100
What is the average rating of 'Man of Steel'? q('Man of Steel', A) t(M, W) t(M, W) Man of Steel f('Man of Steel', 16600) Man of Steel g('Man of Steel', 2000) sum(W, S) sum(W, S) f('Man of Steel', 16600) 16600 count(W, C) count(W, C)
g('Man of Steel', 2000) 2000
t('Man of Steel', W), sum('Man of Steel', S), count('Man of Steel', C), A = S / C q('Man of Steel', 8.3)

Alexander Serebrenik

Current Issues in Databases and …, 2000

Queries involving aggregation are typical in database applications. One of the main ideas to optimize the execution of an aggregate query is to reuse results of previously answered queries. This leads to the problem of rewriting aggregate queries using views. Due to a lack of theory, algorithms for this problem were rather ad-hoc. They were sound, but were not proven to be complete. Recently we have given syntactic characterizations for the equivalence of aggregate queries and applied them to decide when there exist rewritings. However, these decision procedures do not lend themselves immediately to an implementation. In this paper, we present practical algorithms for rewriting queries with count and sum. Our algorithms are sound. They are also complete for important cases. Our techniques can be used to improve well-known procedures for rewriting non-aggregate queries. These procedures can then be adapted to obtain algorithms for rewriting queries with min and max. The algorithms presented are a basis for realizing optimizers that rewrite queries using views.

downloadDownload free PDF View PDFchevron_right

Rewriting aggregate queries using views

Alexander Serebrenik

Proceedings of the eighteenth …, 1999

We i n v estigate the problem of rewriting queries with aggregate operators using views that may o r m a y not contain aggregate operators. A rewriting of a query is a second query that uses view predicates such that evaluating rst the views and then the rewriting yields the same result as evaluating the original query. In this sense, the original query and the rewriting are equivalent modulo the view de nitions. The queries and views we consider correspond to unnested SQL queries, possibly with union, that employ the operators min, max, count, and sum.

downloadDownload free PDF View PDFchevron_right

multi-dimensional aggregation

Michael Akinde, Damianos Chatziantoniou, Michael Böhlen, Johann Gamper

Information Systems, 2011

The SQL:2003 standard introduced window functions to enhance the analytical processing capabilities of SQL. The key concept of window functions is to sort the input relation and to compute the aggregate results during a scan of the sorted relation. For multi-dimensional OLAP queries with aggregation groups defined by a general y condition an appropriate ordering does not exist, though, and hence expensive join-based solutions are required.

downloadDownload free PDF View PDFchevron_right

Incremental maintenance of aggregate and outerjoin expressions

Himanshu Gupta

Information Systems, 2006

Views stored in a data warehouse need to be kept current. As recomputing the views is very expensive, incremental maintenance algorithms are required. Over recent years, several incremental maintenance algorithms have been proposed. None of the proposed algorithms handle the general case of relational expressions involving aggregate and outerjoin operators efficiently.

downloadDownload free PDF View PDFchevron_right

Query Rewriting Based on Meta-Granular Aggregation

Piotr Wiśniewski

Fundamenta Informaticae, 2014

Analytical database queries are exceptionally time consuming. Decision support systems employ various execution techniques in order to accelerate such queries and reduce their resource consumption. Probably the most important of them consists in materialization of partial results. However, any introduction of additional derived objects into the database schema increases the cost of software development, since programmers must take care of their usage and synchronization. In this paper we propose novel query rewriting methods that build queries using partial aggregations materialized in additional tables. These methods are based on the concept of meta-granules that represent the information on grouping and used aggregations. Meta-granules have a natural partial order that guides the optimisation process. We also present an experimental evaluation of the proposed rewriting method.

downloadDownload free PDF View PDFchevron_right

On the Power of Aggregation In Relational Query Languages

Limsoon Wong

Database Programming Languages, 1998

It is a folk result that relational algebra or calculus extended with aggregate functions cannot compute the transitive closure. However, proving folk results is sometimes a nontrivial task. In this paper, we tell the story of the work on expressive power of relational languages with aggregate functions. We also prove by far the most powerful result that describes the expressiveness of such languages. There are four main features of our result that distinguish it from previous ones:

downloadDownload free PDF View PDFchevron_right

Query processing techniques in the summary-table-by-example database query language

Victor Manuel Fonseca Matos

ACM Transactions on Database Systems, 1989

Summary-Table-by-Example (STBE) is a graphical language suitable for statistical database applications. STBE queries have a hierarchical subquery structure and manipulate summary tables and relations with set-valued attributes. The hierarchical arrangement of STBE queries naturally implies a tuple-by-tuple subquery evaluation strategy (similar to the nested loops join implementation technique) which may not be the best query processing strategy. In this paper we discuss the query processing techniques used in STBE. We first convert an STBE query into an “extended” relational algebra (ERA) expression. Two transformations are introduced to remove the hierarchical arrangement of subqueries so that query optimization is possible. To solve the “empty partition” problem of aggregate function evaluation, directional join (one-sided outer-join) is utilized. We give the algebraic properties of the ERA operators to obtain an “improved” ERA expression. Finally we briefly discuss the generation...

downloadDownload free PDF View PDFchevron_right

Extensible grouping and aggregation for data reconciliation

Kai-Uwe Sattler

2001

New applications from the areas of analytical data processing and data integration require powerful features to condense and reconcile available data. Object-relational and other data management systems available today provide only limited concepts to deal with these requirements. The general concept of grouping and aggregation appears to be a fitting paradigm for a number of the mentioned issues, but in its common form of equality based groups and restricted aggregate functions a number of problems remain unsolved. Various extensions to this concept have been introduced over the last years, especially regarding user-defined functions for aggregation and derivation of grouping properties. We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications and illustrate the approach by introducing new concepts for similarity-based duplicate detection and elimination.

downloadDownload free PDF View PDFchevron_right

Reformulating Aggregate Queries Using Views

Sign up for access to the world's latest research

Related papers

References (4)

Related papers

Related topics