Mining low dimensionality data streams of continuous attributes
2003
https://doi.org/10.1007/978-3-540-24580-3_33Abstract
This paper presents an incremental and scalable learning algorithm in order to mine numeric, low dimensionality, high-cardinality, time-changing data streams. Within the Supervised Learning field, our approach, named SCALLOP, provides a set of decision rules whose size is very near to the number of concepts to be extracted. Experimental results with synthetic databases of different complexity degrees show a good performance from streams of data received at a rapid rate, whose label distribution may not be stationary in time.
References (22)
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20 th International Conf. on Very Large Data Bases, VLDB, pages 487-499. Morgan Kaufmann, 12-15 1994.
- P.L. Bartlett, S. Ben-David, and S.R. Kulkarni. Learning changing concepts by exploiting the structure of change. In Computational Learing Theory, pages 131- 139, 1996.
- P.S. Bradley, U.M. Fayyad, and C. Reina. Scaling clustering algorithms to large database. Knowledge Discovery and Data Mining, pages 9-15, 1998.
- J. Cattlet. Megainduction: machine learning on very large databases. PhD thesis, Basser Department of Computer Science, University of Sydney, Australia, 1991.
- D. Wai-Lok Cheung, J. Han, V. Ng, and C. Y. Wong. Maintenance of discovered association rules in large databases: An incremental updating technique. In ICDE, pages 106-114, 1996.
- F.J. Ferrer-Troyano, J.S. Aguilar-Ruiz, and J.C. Riquelme
- A. Dobra and J. Gehrke. Secret: A scalable linear regression tree algorithm. In Proc. 8 th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002. ACM Press.
- P. Domingos and G. Hulten. Mining high-speed data streams. In Proc. 6 th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 71-80, Boston, MA, 2000.
- V. Ganti, J. Gehrke, and R. Ramakrishnan. Mining data streams under block evolution. ACM SIGKDD Explorations, 3(2):1-10, 2002.
- M. Garofalakis and R. Rastogi. Scalable data mining with model constraints. ACM SIGKDD Explorations, 2(2):39-48, 2000.
- J. Gehrke, V. Ganti, R. Ramakrishnan, and W.Y. Loh. BOAT -optimistic decision tree construction. In ACM SIGMOD Conference, pages 169-180, Philadelphia, Pennsylvania, 1999.
- J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest -a framework for fast decision tree construction of large datasets. In Proc. 24 th Int. Conf. Very Large Data Bases, VLDB, pages 416-427 , 1998.
- S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science, pages 359-366, 2000.
- S. Guha, R. Rastogi, and K. Shim. CURE: an efficient clustering algorithm for large databases. In ACM SIGMOD International Conference on Management of Data, pages 73-84, June 1998.
- G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. In Proc. 7 th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 97-106, San Francisco, CA, 2001. ACM Press.
- G. Hulten, L. Spencer, and P. Domingos. Mining complex models from arbitrarily large databases in constant time. In Proc. 8 th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada, 2002. ACM Press.
- M. Kelly, D. Hand, and N. Adams. The impact of changing populations on classier performance, 1999.
- R. Rastogi M. Garofalakis, D. Hyun and K. Shim. Scalable data mining with model constraints. In Proc. 6 th ACM SIGKDD International Conf. on Knowledge Discovery and Data Mining, pages 335-339, Boston, MA, 2000.
- L. O'Callaghan, N. Mishra, A. Meyerson, and S. Guha. High-performance clus- tering of streams and large data sets. In Proc. 18 th International Conf. on Data Engineering, pages 359-366, 2000.
- F. Provost and V. Kolluri. A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3(2):131-169, 1999.
- J.C. Shafer, R. Agrawal, and M. Mehta. SPRINT: A scalable parallel classifier for data mining. In Proc. 22 th International Conf. Very Large Databases, VLDB, pages 544-555, 1996.
- T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data cluster- ing method for very large databases. In ACM SIGMOD International Conf. on Management of Data, pages 103-114, Montreal, Canada, June 1996.