A Systematic Comparative Analysis of Clustering Techniques
2020, Riga Technical University
https://doi.org/10.2478/ACSS-2020-0011Abstract
Clustering has now become a very important tool to manage the data in many areas such as pattern recognition, machine learning, information retrieval etc. The database is increasing day by day and thus it is required to maintain the data in such a manner that useful information can easily be extracted and used accordingly. In this process, clustering plays an important role as it forms clusters of the data on the basis of similarity in data. There are more than hundred clustering methods and algorithms that can be used for mining the data but all these algorithms do not provide models for their clusters and thus it becomes difficult to categorise all of them. This paper describes the most commonly used and popular clustering techniques and also compares them on the basis of their merits, demerits and time complexity.
References (52)
- L. Kaufman, and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, 2009.
- A. Fahad, N. Alshatri, Z. Tari, A. Alamri, I. Khalil, A. Y. Zomaya, S. Foufou, and A. Bouras, "A survey of clustering algorithms for big data: Taxonomy and empirical analysis, "IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267-279, Sep. 2014. https://doi.org/10.1109/TETC.2014.2330519
- A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: A review," ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, Sep. 1999. https://doi.org/10.1145/331499.331504
- D. T. T. Khaing, "Review the clustering algorithm in big data," International Journal of Advance Research and Innovative Ideas in Education, vol. 5, no. 4, pp. 1390-1403, 2019.
- J. C. Bezdek, R. Ehrlich, and W. Full, "FCM: The fuzzy c-means algorithm," Computers & Geosciences, vol. 10, no. 2-3, pp. 191-203, Dec. 1984. https://doi.org/10.1016/0098-3004(84)90020-7
- R. L. Cannon, J. V. Dave, and J. C. Bezdek, "Efficient implementation of the fuzzy c-means clustering algorithm," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-8, no. 2, pp. 248-255, Mar. 1986. https://doi.org/10.1109/TPAMI.1986.4767778
- M.-C. Hung, and D.-L. Yang, "An efficient fuzzy c-means clustering algorithm," in 2001 IEEE International Conference on Data Mining, pp. 225-232. https://doi.org/10.1109/ICDM.2001.989523
- J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York, 1981.
- D. E. Gustafson, and W. C. Kessel, "Fuzzy clustering with a fuzzy covariance matrix," in 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes, pp. 761-766. https://doi.org/10.1109/CDC.1978.268028
- O. J. Oyelade, O. O. Oladipupo, and I. C. Obagbuwa, "Application of k- means clustering algorithm for prediction of students' academic performance,"International Journal of Computer Science and Information Security, vol. 7, no. 1, pp. 292-295, 2010.
- A. K.Jumaa, A. A. Abudalrahman, R. R. Aziz, and A. A.Shaltooki, "Protect sensitive knowledge in data mining clustering algorithm," Journal of Theoretical and Applied Information Technology, vol. 95, no. 15, pp. 3422-3431, 2017.
- I. A. Atiyah, A. Mohammadpour, and S. M. Taheri, "KC-Means: A fast fuzzy clustering," Advances in Fuzzy Systems, article number 2634861, 2018. https://doi.org/10.1155/2018/2634861
- L. Kaufman, and P. J.Rousseeuw, Clustering by Means of Medoids.Faculty of Mathematics and Informatics, 1987.
- H.-S. Park, and C.-H. Jun, "A simple and fast algorithm for k-medoids clustering," Expert Systems with Applications, vol. 36, no. 2, part 2, pp. 3336-3341, Mar. 2009. https://doi.org/10.1016/j.eswa.2008.01.039
- L. Kaufman, and P. J. Rousseeuw, Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, 1990.
- R. T. Ng, and J. Han, "CLARANS: A method for clustering objects for spatial data mining," IEEE Transactions on Knowledge and Data Engineering, vol. 14, no. 5, pp. 1003-1016, Sep./Oct. 2002. https://doi.org/10.1109/TKDE.2002.1033770
- E. Schubert, and P. Rousseeuw, "Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms," Lecture Notes in Computer Science, vol 11807. Springer, Cham, 2018. https://doi.org/10.1007/978-3-030-32047-8_16
- M. K. Rafsanjani, Z. A. Varzaneh, and N. E. Chukanlo, "A survey of hierarchical clustering algorithms, "The Journal of Mathematics and Computer Science, vol. 5, no. 3, pp. 229-240, 2012. https://doi.org/10.22436/jmcs.05.03.11
- P. H. A. Sneath, and R. R. Sokal, Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman and Company, 1973.
- S. Guha, R. Rastogi, and K. Shim, "Cure: An efficient clustering algorithm for large databases, "Information Systems, vol. 26, no. 1, pp. 35-58, Mar. 2001. https://doi.org/10.1016/S0306-4379(01)00008-4
- T. Zhang, R. Ramakrishnan, and M. Livny, "BIRCH: A new data clustering algorithm and its applications, "Data Mining and Knowledge Discovery, vol. 1, no. 2, pp. 141-182, Jun. 1997. https://doi.org/10.1023/A:1009783824328
- S. Guha, R. Rastogi, and K. Shim, "ROCK: A robust clustering algorithm for categorical attributes," in 15th International Conference on Data Engineering, IEEE, 1999, pp. 512-521. https://doi.org/10.1109/ICDE.1999.754967
- J. C. Dunn, "A fuzzy relative of the ISODATA process and its use in detecting compact, well-separated clusters,"Journal of Cybernetics, vol. 3, no. 3, pp. 32-57, Jan. 1973. https://doi.org/10.1080/01969727308546046
- G. Karypis, and E.-H. Han, "CHAMELEON: A hierarchical clustering algorithm using dynamic modeling," vol. 32, no. 8, pp. 68-75, Aug. 1999. https://doi.org/10.1109/2.781637
- X. Cao, T. Su, P. Wang, G. Wang, Z.Lv, and X. Li, "An optimized chameleon algorithm based on local features," in10th International Conference on Machine Learning and Computing, ACM, 2018, pp. 184- 192. https://doi.org/10.1145/3195106.3195118
- P. Macnaughton-Smith, W. T. Williams, M. B. Dale, and L. G. Mockett, "Dissimilarity analysis: a new technique of hierarchical sub-division, "Nature, vol. 202, pp. 1034-1035, 1964. https://doi.org/10.1038/2021034a0
- S.Lamrous. and M.Taileb, "Divisive hierarchical k-means," in International Conference on Computational Intelligence for Modelling Control and Automation and International Conference on Intelligent Agents Web Technologies and International Commerce, IEEE, 2006, p. 18. https://doi.org/10.1109/CIMCA.2006.89
- J. Di, and X. Gou, "Bisecting k-means algorithm based on k-valued self- determining and clustering center optimization," Journal of Computers, vol. 13, no. 6, pp. 588-595, Jun. 2018. https://doi.org/10.17706/jcp.13.6.588-595
- Y. El-Sonbaty, M. A. Ismail, and M. Farouk, "An efficient density based clustering algorithm for large databases," in 16th IEEE International Conference on Tools with Artificial Intelligence, IEEE, 2004, pp. 673- 677. https://doi.org/10.1109/ICTAI.2004.27
- M. Ester, H.-P.Kriegel, J. Sander, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," in 2nd International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226-231.
- A. Merk, P. Cal, and M. Wozniak, "Distributed DBSCAN algorithm - Concept and experimental evaluation," in Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. Advances in Intelligent Systems and Computing, vol 578, Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_49
- G. H. Shah, "An improved DBSCAN, a density based clustering algorithm with parameter selection for high dimensional data sets," in Nirma University International Conference on Engineering, IEEE, 2012, article number 6493211. https://doi.org/10.1109/NUICONE.2012.6493211
- L. Meng'Ao, M. Dongxue, G. Songyuan, and L. Shufen, "Research and improvement of DBSCAN cluster algorithm," in 7th International Conference on Information Technology in Medicine and Education, IEEE, 2015, pp. 537-540. https://doi.org/10.1109/ITME.2015.100
- M.Ankerst, M. M. Breunig, H.-P.Kriegel, and J. Sander, "OPTICS: Ordering points to identify the clustering structure," ACMSIGMOD Record, vol. 28, no. 2, pp. 49-60, Jun. 1999. https://doi.org/10.1145/304181.304187
- B. Shen, and Y.-S. Zhao, "Optimization and application of OPTICS algorithm on text clustering, "Journal of Convergence Information Technology, vol. 8, no. 11, pp. 375-383, Jun. 2013. https://doi.org/10.4156/JCIT.VOL8.ISSUE11.43
- X. Xu, M. Ester, H.-P. Kriegel, and J. Sander, "A distribution-based clustering algorithm for mining in large spatial databases," in 14th International Conference on Data Engineering, IEEE, 1998, pp. 324-331. https://doi.org/10.1109/ICDE.1998.655795
- A. Hinneburg, and D. A. Keim, "An efficient approach to clustering in large multimedia databases with noise," in 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 58-65.
- H. Rehioui, A. Idrissi, M. Abourezq, and F. Zegrari, "DENCLUE-IM: A new approach for big data clustering," Procedia Computer Science, vol. 83, pp. 560-567, 2016. https://doi.org/10.1016/j.procs.2016.04.265
- D. Xu, and Y. Tian, "A comprehensive survey of clustering algorithms," Annals of Data Science, vol.2, pp. 165-193, 2015. https://doi.org/10.1007/s40745-015-0040-1
- M. R. Ilango, and V. Mohan, "A survey of grid based clustering algorithms," International Journal of Engineering Science and Technology, vol. 2, no. 8, pp. 3441-3446, 2010.
- Y. Lu, Y. Sun, G. Xu, and G. Liu, "A grid-based clustering algorithm for high-dimensional data streams," in Li X., Wang S., Dong Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science, vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_97
- W. Wang, J. Yang, and R. Muntz, "STING: A statistical information grid approach to spatial data mining," in 23rd International Conference on Very Large Data Bases, 1997, pp. 186-195.
- G. Sheikholeslami, S. Chatterjee, and A. Zhang, "WaveCluster: A wavelet-based clustering approach for spatial data in very large databases," The VLDB Journal, vol. 8, pp. 289-304, Feb. 2000. https://doi.org/10.1007/s007780050009
- R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan, "Automatic subspace clustering of high dimensional data for data mining applications," ACMSIGMOD Record, vol. 27, no. 2, pp. 94-105, Jun. 1998. https://doi.org/10.1145/276305.276314
- G. Schoier, and G. Borruso, "On model based clustering in a spatial data mining context," in Murgante B. et al. (eds) Computational Science and Its Applications -ICCSA 2013. Lecture Notes in Computer Science, vol 7974. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642- 39649-6_27
- M. Meila, and D. Heckerman, "An experimental comparison of model- based clustering methods," Machine Learning, vol. 42, pp. 9-29, 2001. https://doi.org/10.1023/A:1007648401407
- A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm, "Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1-38, 1977.
- T. K. Moon, "The expectation-maximization algorithm," IEEE Signal Processing Magazine, vol. 13, no. 6, pp. 47-60, Nov 1996. https://doi.org/10.1109/79.543975
- D. H. Fisher, "Knowledge acquisition via incremental conceptual clustering," Machine Learning, vol. 2, pp. 139-172, 1987. https://doi.org/10.1023/A:1022852608280
- T. Kohonen, "The self-organizing map, "Proceedings of the IEEE, vol. 78, no. 9, pp. 1464-1480, Sep. 1990. https://doi.org/10.1109/5.58325
- T. Tateyama, S. Kawata, and H. Ohta, "A conditional clustering algorithm using self-organising map, "in SICE 2003 Annual Conference, IEEE, 2003, vol. 3, pp. 3259-3264.
- A. Toor, "An advanced clustering algorithm (ACA) for clustering large dataset to achieve high dimensionality, "Global Journal of Computer Science and Technology: C Software and Data Engineering, vol. 14, no. 2, pp. 71-74, 2014.