Academia.eduAcademia.edu

Outline

COMPARATIVE STUDY OF VARIOUS CLUSTERING TECHNIQUES

https://doi.org/10.1109/ICBDSC.2016.7460397

Abstract

Clustering is a process of dividing the data into groups of similar objects and dissimilar ones from other objects. Representation of data by fewer clusters necessarily loses fine details, but achieves simplification. Data is model by its clusters. Clustering plays an significant part in applications of data mining such as scientific data exploration, information retrieval, text mining, city-planning, earthquake studies, marketing, spatial database applications, Web analysis, marketing, medical diagnostics, computational biology, etc. Clustering plays a role of active research in several fields such as statistics, pattern recognition and machine learning. Data mining adds complications to very large datasets with many attributes of different types to clustering. Unique computational requirements are imposed on relevant clustering algorithms. A variety of clustering algorithms have recently emerged that meet the various requirements and were successfully applied to many real-life data mining problems.

Key takeaways
sparkles

AI

  1. Clustering techniques simplify data representation by grouping similar objects, pivotal in data mining applications.
  2. The study reviews various clustering algorithms, including hierarchical, partitioning, and grid-based methods.
  3. K-means and K-medoids provide effective partitioning techniques by optimizing cluster centers or representative points.
  4. Feature selection using the proposed FAST algorithm enhances efficiency by removing irrelevant and redundant features.
  5. Kruskal's algorithm constructs minimum spanning trees to improve clustering performance on high-dimensional datasets.

References (10)

  1. Qinbao Song, Jingjie Ni and Guangtao Wang, A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING VOL:25 NO:1 YEAR 2013.
  2. Osama Abu Abbas, Comparision between Data Clustering Algorithms, The International Arab journal of Information Technology, Vol. 5, No. 3,July 2008.
  3. A Review: Comparative Study of Various Clustering Techniques in Data Mining, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 3, March 2013.
  4. Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In Proceedings of the International Conference on Management of Data, (SIGMOD), volume 27(2) of SIGMOD Record, pages 94-105, Seattle,WA, USA, 1-4 June 1998. ACM Press.
  5. Jiawei Han and Michelle Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 2001.
  6. M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases. KDD'96.
  7. Literature Survey on Clustering Techniques, http://www.slideshare.net/IOSR/a0310112-26684753.
  8. BOTTOU, L. and BENGIO, Y. 1995. Convergence properties of the K-means algorithms. In Tesauro, G. and Touretzky, D. (Eds.) Advances in Neural Information Processing Systems 7, 585-592, The MIT Press, Cambridge, MA.
  9. DHILLON, I., FAN, J., and GUAN, Y. 2001. Efficient clustering of very large document collections. In Grossman, R.L., Kamath, C., Kegelmeyer, P., Kumar, V., and Namburu, R.R. (Eds.) Data Mining for Scientific and Engineering Applications, Kluwer Academic Publishers.
  10. Alexander Hinneburg and Daniel A. Keim. An Efficient Approach to Clustering in Large Multimedia Databases with Noise. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, (KDD), pages 58-65, New York, NY, USA, 27-31 August 1998. AAAI Press.