Academia.eduAcademia.edu

Outline

Data Stream Processing

2007, Learning from Data Streams

https://doi.org/10.1007/3-540-73679-4_3

Abstract

The rapid growth in information science and technology in general and the complexity and volume of data in particular have introduced new challenges for the research community. Many sources produce data continuously. Examples include sensor networks, wireless networks, radio frequency identification (RFID), customer click streams, telephone records, multimedia data, scientific data, sets of retail chain transactions etc. These sources are called data streams. A data stream is an ordered sequence of instances that can be read only once or a small number of times using limited computing and storage capabilities. These sources of data are characterized by being open-ended, flowing at high-speed, and generated by non stationary distributions in dynamic environments.

References (28)

  1. N. Alon, Y. Matias, M. Szegedy, The space complexity of approximating the frequency mo- ments. Journal of Computer and System Sciences, 58:137-147, 1999.
  2. A. Arasu, G.S. Manku, Approximate counts and quantiles over sliding windows. In: ACM Symposium on Principles of Database Systems (PODS), pp. 286-296. ACM Press, New York, 2004.
  3. B. Babcock, S. Babu, M. Datar, R. Motwani, J. Widom, Models and issues in data stream sys- tems. In: P.G. Kolaitis (Ed.), Proceedings of the 21nd Symposium on Principles of Database Systems, pp. 1-16. ACM Press, New York, 2002.
  4. B. Babcock, M. Datar, Sampling from a moving window over streaming data. In: Proc. of the 13th Annual ACM SIAM Symposium on Discrete Algorithms, pp. 633-634. ACM/SIAM, New York/Philadelphia, 2002.
  5. C. Bettini, S.G. Jajodia, S.X. Wang, Time Granularities in Databases, Data Mining and Tem- poral Reasoning. Springer, Berlin, 2000.
  6. D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, S.B. Zdonik, Monitoring streams-a new class of data management applications. In: VLDB, pp. 215-226, 2002.
  7. K. Chakrabarti, M. Garofalakis, R. Rastogi, K. Shim, Approximate query processing using wavelets. VLDB Journal: Very Large Data Bases, 10(2-3):199-223, 2001.
  8. S. Chaudhuri, R. Motwani, V.R. Narasayya, On random sampling over joins. In: SIGMOD Conference, pp. 263-274, 1999.
  9. G. Cormode, S. Muthukrishnan, What's hot and what's not: tracking most frequent items dynamically. In: ACM Symposium on Principles of Database Systems (PODS), pp. 296-306, 2003.
  10. A. Das, J. Gehrke, M. Riedewald, Approximate join processing over data streams. In: Proc. of the ACM SIGMOD International Conference on Management of Data, pp. 69-84, 2003.
  11. M. Datar, A. Gionis, P. Indyk, R. Motwani, Maintaining stream statistics over sliding win- dows. In: Proceedings of 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 635-644. Society for Industrial and Applied Mathematics, 2002.
  12. M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, J.D. Ullman, Computing iceberg queries efficiently. In: Proc. 24th Int. Conf. Very Large Data Bases, VLDB, pp. 299-310, 1998.
  13. P. Flajolet, G.N. Martin, Probabilistic counting algorithms for data base applications. Journal of Computer and System Sciences, 31(2):182-209, 1985.
  14. J. Gehrke, F. Korn, D. Srivastava, On computing correlated aggregates over continual data streams. In: SIGMOD Conference, pp. 13-24. ACM Press, New York, 2001.
  15. P.B. Gibbons, Distinct sampling for highly-accurate answers to distinct values queries and event reports. Very Large Data Boses Journal, 541-550, 2001.
  16. M. Greenwald, S. Khanna, Space-efficient online computation of quantile summaries. In: SIGMOD Conference, pp. 58-66, 2001.
  17. S. Guha, B. Harb, Wavelet synopsis for data streams: minimizing non-euclidean error. In: Proceeding of the Eleventh ACM SIGKDD International Conference on Knowledge Discov- ery in Data Mining, pp. 88-97. ACM Press, New York, 2005.
  18. S. Guha, K. Shim, J. Woo, Rehist: relative error histogram construction algorithms. In: VLDB 04: Proceedings of the 30th International Conference on Very Large Data Bases, pp. 288-299. Morgan Kaufmann, San Mateo, 2004.
  19. J. Han, M. Kamber, Data Mining Concepts and Techniques. Morgan Kaufmann, San Mateo, 2006.
  20. C. Huitema, IPv6: The New Internet Protocol. Prentice Hall, New York, 1998.
  21. B. Jawerth, W. Sweldens, An overview of wavelet based multiresolution analyses. SIAM Rev., 36(3):377-412, 1994.
  22. Z. Longbo, L. Zhanhuai, Y. Min, W. Yong, J. Yun, Random sampling algorithms for slid- ing windows over data streams. In: Proceedings of the 11th Joint International Computer Conference-JICC. World Scientific, Singapore, 2005.
  23. Y. Matias, J.S. Vitter, M. Wang, Wavelet-based histograms for selectivity estimation, In: ACM SIGMOD International Conference on Management of Data, pp. 448-459, 1998.
  24. R. Motwani, P. Raghavan, Randomized Algorithms. Cambridge University Press, Cam- bridge, 1997.
  25. S. Muthukrishnan, Data streams: algorithms and applications. Now Publishers, 2005.
  26. V. Raman, B. Raman, J.M. Hellerstein, Online dynamic reordering for interactive data processing. In: The VLDB Journal, pp. 709-720, 1999.
  27. R. Snodgrass, I. Ahn, A taxonomy of time databases. In: SIGMOD '85: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 236-246, USA. ACM Press, New York, 1985.
  28. J.S. Vitter, Random sampling with a reservoir. ACM Transactions on Mathematical Software, 11(1):37-57, 1985.