Academia.eduAcademia.edu

Outline

How to summarize the universe: Dynamic maintenance of quantiles

2002, Proceedings of the 28th …

Abstract

Order statistics, i.e., quantiles, are frequentlyused in databases both at the database serveras well as the application level. For example,they are useful in selectivity estimation duringquery optimization, in partitioning large relations,in estimating query result sizes whenbuilding user interfaces, and in characterizingthe data distribution of evolving datasets inthe process of data mining.

References (24)

  1. R. Agrawal, T. Imielinski and A. Swami. Min- ing Associations between Sets of Items in Mas- sive Databases. In Proc. of ACM SIGMOD, pages 207{216, Washington D.C, May 1993.
  2. R. Agrawal and R. Srikant. Mining Quantita- tive Association Rules in Large Relational Ta- bles. In Proceedings of ACM SIGMOD, pages 1{12, Montreal Canada, June 1996.
  3. R. Agrawal and A. Swami. A One-Pass Space- Ecient Algorithm for Finding Quantiles. In Proceedings of COMAD, Pune, India, 1995.
  4. N. Alon, Y. Matias, M. Szegedy. The Space Complexity of Approximating the Frequency Moments. JCSS 58(1): 137{147 (1999).
  5. N. Alon and J. H. Spencer. The Probabilistic Method. Wiley and Sons, New York, 1992
  6. K. Alsabti, S. Ranka and V. Singh. A One-Pass Algorithm for Accurately Estimating Quan- tiles for Disk-Resident Data. In Proceedings of VLDB, pages 346{355, Athens, Greece, 1997.
  7. M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest and R. E. Tarjan. Time Bounds for Selection. JCSS 7(4): 448{461, 1973.
  8. F. Chen, D. Lambert and J. C. Pinheiro. Incre- mental Quantile Estimation for Massive Track- ing. In Proceedings of KDD, pages 516{522, Boston, August 2000.
  9. D. J. DeWitt, J. F. Naughton and D. A. Schneider. Parallel Sorting on a Shared- Nothing Architecture using Probabilistic Split- ting. In PDIS, pages 280{291, 1991.
  10. P. B. Gibbons. Distinct Sampling for Highly- Accurate Answers to Distinct Values Queries and Event Reports. In Proc of VLDB, pages 541{550, Rome, Italy, 2001
  11. P. Gibbons, Y. Matias and V.Poosala. Fast Incremental Maintenance of Approximate His- tograms. In Proceedings of VLDB, pages 466{ 475, Athens, Greece, 1997.
  12. A. C. Gilbert and Y. Kotidis and S. Muthukr- ishnan and M. J. Strauss Sur ng Wavelets on Streams: One-pass Summaries for Approx- imate Aggregate Queries In Proc. of VLDB, 2001.
  13. A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, and M. Strauss. Fast, Small-Space Algorithms for Approximate His- togram Maintenance. In Proceedings of the 34th ACM Symposium on Theory of Comput- ing, Montr eal, Qu ebec, Canada, May 2002.
  14. M. Greenwald and Sanjeev Khanna. Space- E cient Online Computation of Quantile Sum- maries. In Proceedings of ACM SIGMOD, pages 58{66, Santa Barbara, California, May 2001.
  15. R. Jain and I. Chlamtac. The P 2 Algo- rithm for Dynamic Calculation of Quantiles and Histograms Without Storing Observations. In Communications of the ACM, 28(10):1076{ 1085, October 1985.
  16. T. Johnson, S. Muthukrishnan, P. Dasu and V. Shkapenyuk. Mining Database Structure; Or, How to Build a Data Quality Browser. In Proc. of ACM SIGMOD, to appear, 2002.
  17. G.S. Manku, S. Rajagopalan, B.G. Lindsay. Approximate Medians and other Quantiles in One Pass and with Limited Memory. In Proc of ACM SIGMOD, pages 426{435, Seattle, WA, 1998.
  18. G.S. Manku, S. Rajagopalan, B.G. Lindsay. Random sampling techniques for space e cient online computation of order statistics of large datasets In Proc of ACM SIGMOD, 1999.
  19. J. I. Munro and M. S. Paterson. Selection and Sorting with Limited Storage. In TCS 12, 1980.
  20. M. S. Paterson. Progress in Selection. Techni- cal Report, University of Warwick, Coventry, UK, 1997.
  21. V. Poosala. Histogram-Based Estimation Tech- niques in Database Systems. Ph. D. disserta- tion, University of Wisconsin-Madion, 1997.
  22. V. Poosala and Y. Ioannidis. Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing In Proceedings of VLDB, pages 448{459, 1996.
  23. V. Poosala, Y. E. Ioannidis, P. J. Haas and E. J. Shekita. Improved Histograms for Selec- tivity Estimation of Range Predicates. In Proc of ACM SIGMOD, pages 294{305, 1996.
  24. Y. Matias, J. Vitter and M. Wang. Dynamic Maintenance of Wavelet-based Histograms. In Proceedings of VLDB, pages 101{110, Cairo, Egypt, Sept. 2000.