Ensemble Classifier for Mining Data Streams
2014, Procedia Computer Science
https://doi.org/10.1016/J.PROCS.2014.08.120Abstract
The problem addressed in this paper concerns mining data streams with concept drift. The goal of the paper is to propose and validate a new approach to mining data streams with concept-drift using the ensemble classifier constructed from the one-class base classifiers. It is assumed that base classifiers of the proposed ensemble are induced from incoming chunks of the data stream. Each chunk consists of prototypes and can be updated using instance selection technique when a new data have arrived. When a new data chunk is formed, ensemble model is also updated on the basis of weights assigned to each one-class classifier. The proposed approach is validated experimentally.
References (44)
- Mitchell T. Machine Learning. McGraw-Hill, New York 1997.
- Cichosz P. Systemy uczace siȩ. Wydawnictwo Naukowo-Techniczne, Warszawa 2000 (in Polish).
- Bifet A. Adaptive learning and mining for data streams and frequent patterns. PhD thesis, Universitat Politecnica de Catalunya; 2009.
- Cazzolato MT, Ribeiro MX. A Statistical Decision Tree Algorithm Applied on Noisy Data Streams. Symposium on Knowledge Discovery, Mining and Learning -KDMiLe; 2013.
- Sahel Z, Bouchachia A. 2007. Gabrys, B., Rogers, P.: Adaptive Mechanisms for Classification Problems with Drifting Data. In: Apolloni B. et al. (eds.) KES 2007, LNAI 4693, pp. 419-426. Springer-Verlag Berlin Heidelberg; 2007.
- Widmer G, Kubat M. Learning in the Presence of Concept Drift and Hidden Contexts. Machine Learning 23(1), 69-101; 1996.
- Tsymbal A. The Problem of Concept Drift: Definitions and Related work. Tech. Rep. TCD-CS-2004-15, Department of Computer Science, Trinity College Dublin, Dublin, Ireland 2004.
- Klinkenberg R. Learning Drifting Concepts: Example Selection vs. Example Weighting, Intelligent Data Analysis. Incremental Learning Systems Capable of Dealing with Concept Drift 8(3), 281-300; 2004.
- Zhu X, Zhang P, Lin X, Shi Y. Active Learning from Data Streams. In: Proceedings of the Seventh IEEE International Conference on Data Mining, pp. 757-762; 2007.
- Bifet A, Holmes G, Pfahhringer B, Kirkby R, Gavalda R. New Ensemble Methods For Evolving Data Streams. In KDD '09: Proceedings of the 15th ACM SIGKDD International conference on Knowledge Discovery and Data Mining, pp. 139-148, New York, NY, USA, ACM Press; 2009.
- Vitter JS. Random sampling with a reservoir. ACM Trans. Math. Software 11(1), 37-57; 1985.
- Chaudhuri S, Motwani R, Narasayya VR. On random sampling over joins. In: Delis A, Faloutsos C, Ghandeharizadeh S. (eds.), SIGMOD Conference, pp. 263-274. ACM Press; 1999.
- Guha SG, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. FOCS, pp. 359-366; 2000.
- Kuncheva L. Classifier ensembles for changing environments. In: Roli F, Kittler J, Windeatt T. (eds.), Multiple Classifier Systems, LNSC 3077, pp. 1-15. Springer-Verlag Heidelberg; 2004.
- Je ¸drzejowicz J, Je ¸drzejowicz P. Online Classifiers Based on Fuzzy C-means Clustering, Computational Collective Intelligence. Technologies and Applications. In: Badica C, Nguyen NT, Brezovan M, LNAI 8083, pp. 427 -436, Springer, Berlin -Heidelberg; 2013.
- Je ¸drzejowicz P, Je ¸drzejowicz J. A Family of the Online Distance-Based Classifiers. In: Nguyen NT, Attachoo B, Trawinski B, Somboonviwat K, (Eds.): Intelligent Information and Database Systems -6th Asian Conference, ACIIDS 2014, Bangkok, Thailand, Proceedings, Part II. LNCS 8398, pp. 177-186, Springer, 2014.
- Deckert M, Stefanowski J. Comparing Block Ensembles for Data Streams with Concept Drift. In Pechenizkiy M. et al. (eds.), New Trends in Database & Information Systems, AISC 185, pp. 69-78, Springer-Verlag Heidelberg; 2012.
- Kuncheva L, Whitaker CJ. Measures of diversity in classifier ensembles. Machine Learning 51, 181-207; 2003.
- Stefanowski J. Multiple and Hybrid Classifiers. In: Polkowski L. (ed.) Formal Methods and Intelligent Techniques in Control, Decision Making. Multimedia and Robotics, pp. 174-188, Warszawa; 2001.
- Venkatesh G, Gehrke J, Ramakrishnan R. Mining Data Streams under Block Evolution. SIGKDD Explorations 3(2), 1-10; 2002.
- Zhu X, Ding W, Yu PS. One-class learning and concept summarization for data streams. Knowledge Information Systems 28:523:553; 2011.
- Shalev-Shwartz S. Online learning: Theory, Algorithms, and Applications, PhD thesis; 2007.
- Je ¸drzejowicz J, Je ¸drzejowicz P. Cellular GEP-Induced Classifiers. In: Pan JS, Chen SM, Nguyen NT. (Eds.): ICCCI 2010, Part I, LNAI 6421, pp. 343-352. Springer-Verlag Berlin Heidelberg; 2010.
- Czarnowski I, Je ¸drzejowicz P. Online Learning Based on Prototypes. In: Nguyen NT, Attachoo B, Trawinski B, Somboonviwat K. (Eds.): Intelligent Information and Database Systems -6th Asian Conference, ACIIDS 2014, Bangkok, Thailand, April 7-9, 2014, LNAI 8398, pp. 187-196, Springer-Verlag Berlin Heidelberg; 2014.
- Khan S, Madden MG. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review, 1-30, Cambridge University Press; 2014.
- Wozniak M, Cal P, Cyganek B. The Influence of a Classifiers' Diversity on the Quality of Weighted Again Ensemble. In: Nguyen NT, Attachoo B, Trawinski B, Somboonviwat K. (Eds.): Intelligent Information and Database Systems -6th Asian Conference, ACIIDS 2014, Bangkok, Thailand, 2014, LNAI 8398, pp. 90-99, Springer-Verlag Berlin Heidelberg; 2014.
- Whang H, Fan W, Yu PS. Mining concept-drifting data streams using ensemble classifiers. In: Proceedings ACM SIGKDD, pp. 226-235; 2003.
- Hart PE. The Condensed Nearest Neighbour Rule. IEEE Transactions on Information Theory 14, 515-516; 1968.
- Wilson DR, Martinez TR. Reduction Techniques for Instance-based Learning Algorithm. Machine Learning 33(3), 257-286; 2000.
- Aha DW, Kibler D, Albert MK, Instance-based Learning Algorithms. Machine Learning 6, 37-66; 1991.
- Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques (2nd edition), Morgan Kaufman, San Francisco, Cali- fornia, USA; 2005.
- Comit'e De F, Denis F, Gilleron R, Letouzey F. Positive and Unlabeled Examples Help Learning, Algorithmic Learning Theory, Springer, pp. 219-230; 1999.
- Asuncion A, Newman DJ. UCI Machine Learning Repository (http://www.ics.uci.edu/ mlearn/MLRepository.html). Irvine, CA: University of California, School of Information and Computer Science; 2007.
- IDA Benchmark Repository, https://mldata.org/; 2014.
- Quinlan JR. C4.5: Programs for Machine Learning. Morgan Kaufmann, SanMateo, CA; 1993.
- Wang L , Hong-Bing J, Jin Y. Fuzzy Passive-Aggressive classification: A robust and efficient algorithm for online classification problems, Information Sciences 220:46-63; 2013.
- Je ¸drzejowicz J, Je ¸drzejowicz P. Rotation Forrest with GEP-Induced Expression Trees, in: Shea JO', et al. (eds.) Agent and Multi Agent Systems: Technologies and Applications, LNAI, vol. 6682, pp. 495-503, Springer-Heidelberg; 2011.
- Bertini JB, Zhao L, Lopes AA. An incremental learning algorithm based on the K-associated graph for non-stationary data classification, Information Sciences, 246:52-68; 2013.
- Moreno-Torres JG, Saez JA, Herrera F. Study on the Impact of Partition-Induced Dataset Shift on k-fold Cross-Validation, IEEE Transactions on Neural Networks and Learnig Systems 23(8):1304-1312; 2012.
- Sriwanna K, Puntumapon K, Waiyamai K. An Enhanced Class-Attribute Interdependence Maximization Discretization Algorithm, In: Zhu S. et al. (Eds.), ADMA 2012, LNAI 7713, pp.465-476, Springer-Verlag Berlin Heidelberg; 2012.
- Sun Y. Cost-sensitive boosting for classification of imbalanced data, Ph.D. thesis, Waterloo University, Canada; 2007.
- Kim HC, Ghahramani Z. Bayesian Gaussian process classification with the EM-EP algorithm, IEEE Tranactions in Pattern Analysis and Machine Intelligence 28, 1948-1959; 2006.
- Sharma S, Arora A. Adaptive Approach for Spam Detection, IJCSI International Journal of Computer Science Issues, Vol. 10, Issue 4, No 1, 23-26; 2013.
- Cai Y, Sun Y, Li J, Goodison S. Online Feature Selection Algorithm with Bayesian l 1 Regularization, In: Theerramunkong T, et al. (Eds.), Proceeding PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining, LNAI 5476, pp.401-413 Springer-Verlag Berlin, Heidelberg; 2009.