Reformulating Aggregate Queries Using Views
2013, Symposium on Abstraction, Reformulation and Approximation
Sign up for access to the world's latest research
Related papers
Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N -dimensional generalization of these operators. This paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, crosstabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The novelty is that cubes are relations. Consequently, the cube operator can be imbedded in more complex non-procedural data analysis programs. The cube operator treats each of the N aggregation attributes as a dimension of N -space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N -dimensional cube. Super-aggregates are computed by aggregating the N -cube to lower dimensional spaces. This paper (1) explains the cube and roll-up operators, (2) shows how they fit in SQL, (3) explains how users can define new aggregate functions for cubes, and (4) discusses efficient techniques to compute the cube. Many of these features are being added to the SQL Standard.
Proceedings of the tenth international conference on Information and knowledge management, 2001
New applications from the areas of analytical data processing and data integration require powerful features to condense and reconcile available data. Object-relational and other data management systems available today provide only limited concepts to deal with these requirements. The general concept of grouping and aggregation appears to be a fitting paradigm for a number of the mentioned issues, but in its common form of equality based groups and restricted aggregate functions a number of problems remain unsolved. Various extensions to this concept have been introduced over the last years regarding user-defined functions for aggregation and grouping. Especially, existing extensions to the grouping operation like simple derivations of group-by values do not meet the requirements of data integration applications. We propose generic interfaces for user-defined grouping and aggregation as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we discuss high-level language primitives for common applications and illustrate the approach by introducing new concepts for similarity-based duplicate detection and elimination.
Australasian Database Conference, 2002
The problem of rewriting queries using views has important applications in data integration, query optimization, and physical data independence maintenance. Previous researchers have proposed rewriting algorithms for queries and views that are Datalog programs or conjunctive queries with arithmetic comparisons such as built-in predicates, using views. Our method also has advantages over previous algorithms when there are no built-in predicates
2018
We study the problems of decomposing and sharing user-defined aggregate functions in distributed and parallel computing. Aggre-gation usually needs to satisfy the distributive property to compute in parallel, and to leverage optimization in multidimensional data analysis and conjunctive query with aggregation. However, this property is too restricted to allow more aggregation to benefit from these advantages. We propose for user-defined aggregation functions a formal framework to relax the previous condition, and we map this framework to the MRC, an efficient computation model in MapReduce, to automatically generate efficient partial aggrega-tion functions. Moreover, we identify the complete conditions for sharing the result of practical user-defined aggregation without scanning base data, and propose a hybrid solution, the symbolic index, pull-up rules, to optimize the sharing process.
University of Maryland …, 2008
… of the International Conference on Very …, 1995
Efficient processing of aggregation queries is essential for decision support applications. This paper describes a class of query transformations, called eager aggregation and laty aggregation, that allows a query optimizer to move group-by operations up and down the query tree. Eager aggregation partially pushes a groupby past a join. After a group-by is partially pushed down, we still need to perform the original groupby in the upper query block. Eager aggregation reduces the number of input rows to the join and thus may result in a better overall plan. The reverse transformation, lazy aggregation, pulls a group-by above a join and combines two group-by operations into one. This transformation is typically of interest when an aggregation query references a grouped view (a view containing a groupby). Experimental results show that the technique is very beneficial for queries in the TPC-D benchmark.
2010
The SQL:2003 standard introduced window functions to enhance the analytical processing capabilities of SQL. The key concept of window functions is to sort the input relation and to ordering does not exist, though, and hence expensive join-based solutions are required. In this paper we introduce y-constrained multi-dimensional aggregation (y-MDA), which supports multi-dimensional OLAP queries with aggregation groups defined by inequalities. y-MDA is not based on an ordering of the data relation. Instead, the tuples that shall be considered for computing an aggregate value can be determined by a general y condition. This facilitates the formulation of complex queries, such as multi-dimensional cumulative aggregates, which are difficult to express in SQL because no appropriate ordering exists. We present algebraic transformation rules that demonstrate how the y-MDA interacts with other operators of a multi-set algebra. Various techniques for achieving an efficient evaluation of the y-M...
Journal of Functional and Logic …, 2004
We consider the problem of improving the computational efficiency of a functional query language. Our focus is on aggregate operations which have proven to be of practical interest in database querying. Since aggregate operations are typically non-monotonic in nature, recursive programs making use of aggregate operations must be suitably restricted in order that they have a well-defined meaning. In a recent paper we showed that partial-order clauses provide a well-structured means of formulating such queries. The present paper extends earlier work in exploring the notion of declarative pruning. By "declarative pruning" we mean that the programmer can specify declarative information about certain functions in the program without altering the meanings of these functions. Using this information, our proposed execution model provides for more efficient program execution. Essentially we require that certain domains must be totally-ordered (as opposed to being partially-ordered). Given this information, we show how the search space of solutions can be pruned efficiently. The paper presents examples illustrating the language and its computation model, and also presents a formal operational semantics. * This is a revised and expanded version of the paper
Sigmod Record, 2006
Let Σ 1 , Σ 2 be two schemas, which may overlap, C be a set of constraints on the joint schema Σ 1 ∪ Σ 2 , and q 1 be a Σ 1 -query. An (equivalent) reformulation of q 1 in the presence of C is a Σ 2 -query, q 2 , such that q 2 gives the same answers as q 1 on any Σ 1 ∪ Σ 2 -database instance that satisfies C. In general, there may exist multiple such reformulations and choosing among them may require, for example, a cost model.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (4)
- Big Picture score(M, S) score(M, S) Man of Steel 16600 Twilight 2550 views(M, S) views(M, S) Man of Steel 2000 Twilight 100
- What is the average rating of 'Man of Steel'? q('Man of Steel', A) t(M, W) t(M, W) Man of Steel f('Man of Steel', 16600) Man of Steel g('Man of Steel', 2000) sum(W, S) sum(W, S) f('Man of Steel', 16600) 16600 count(W, C) count(W, C)
- g('Man of Steel', 2000) 2000
- t('Man of Steel', W), sum('Man of Steel', S), count('Man of Steel', C), A = S / C q('Man of Steel', 8.3)