Data Mining Process Using Clustering: A Survey

Professor Mo Saraee

Outline

Data Mining Process Using Clustering: A Survey

irpds.com

Abstract

Clustering is a basic and useful method in understanding and exploring a data set. Clustering is division of data into groups of similar objects. Each group, called cluster, consists of objects that are similar between themselves and dissimilar to objects of other groups. Interest in clustering has increased recently in new areas of applications including data mining, bioinformatics, web mining, text mining, image analysis and so on. This survey focuses on clustering in data mining. The goal of this survey is to provide a review of different clustering algorithms in data mining. A Categorization of clustering algorithms has been provided closely followed by this survey. The basics of Hierarchical Clustering include Linkage Metrics, Hierarchical Clusters of Arbitrary and Binary Divisive Partitioning is discussed at first. Next discussion is Algorithms of the Partitioning Relocation Clustering include Probabilistic Clustering, K-Medoids Methods, K-Means Methods. Density-Based-Partitioning, Grid-Based Methods and Co-Occurrence of Categorical Data are other sections. Their comparisons are mostly based on some specific applications and under certain conditions. So the results may become quite different if the conditions change.

References (42)

P. Hansen and B. Jaumard, "Cluster analysis and mathematical programming," Math. Program., vol. 79, pp. 191-215, 1997.
Bing Liu, Yuliang Shi, Zhihui Wang, Wei Wang, Baile Shi: Dynamic Incremental Data Summarization for Hierarchical Clustering. Electronic Edition (link) BibTeX.2006
Lai, Ying Orlandic, Ratko Yee, Wai Gen Kulkarni, Sachin Scalable "Clustering for Large High-Dimensional Data Based on Data Summarization Computer Science", Illinois Institute of Technology, Chicago,IL60616,U.S. 2007
GUHA, S., RASTOGI, R., and SHIM, K.. "CURE: An efficient clustering algorithm for large databases". In Proceedings of the ACM SIGMOD Conference, 73-84, Seattle, WA. 1998
F. Murtagh. A survey of recent advances inhierarchical clustering algorithms. The Computer Journal, 26(4):354-359, 1983.
D. Pelleg and A. Moore. "X-means: Extending K-means with efficient estimation of the number of clusters". In Proceedings of the Seventeenth International Conference on Machine Learning (ICML), pages 727-734, 2000.
G. Hamerly and C. Elkan. Learning the k in k-means. In Proceedings of NIPS, 2003.
R. T. Ng and J. Han." Efficient and effective clustering methods for spatial data mining". In Proc. of VLDB Conference., pages 144-155, 1994.
S. Guha, R. Rastogi, and K. Shim. CURE:An efficient clustering algorithm for large databases. In SIGMOD Conference, pages 73-84, 1998.
I. Jolliffe. "Principal Component Analysis". Springer Verlag, 1986.
T. Zhang, R. Ramakrishnan, and M. Livny. "BIRCH: An efficient data clustering method for very large databases". In SIGMOD Conference, pages 103-114, 1996.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. "A density-based algorithm for discovering clusters in large spatial databases with noise". In KDD Conference, 1996.
JAIN, A. and DUBES. "Algorithms for Clustering Data." Prentice-Hall, Englewood Cliffs, NJ. 1988.
OLSON, C. "Parallel algorithms for hierarchical clustering." Parallel
Computing, 21, 1313-1325. 1995
Pavel Berkhin, "Survey of Clustering Data Mining Techniques",Accrue Software, Inc.2002
CORTER, J. and GLUCK, "Explaining basic categories: feature predictability and information." Psychological
Bulletin, 111, 291-303. M. 1992.
CHIU, T., FANG, D., CHEN, J., and Wang, Y.. "A Robust and scalable clustering algorithm for mixed type attributes in large database environments". In Proceedings of the 7th ACM SIGKDD, 263-268, San Francisco, CA. 2001
GUHA, S., RASTOGI, R., and SHIM, K. ROCK" A robust clustering algorithm for categorical attributes". In Proceedings of the 15th ICDE, 512-521, Sydney, Australia. 1999
BERRY, M.W. and BROWNE, "Understanding Search Engines: Mathematical Modeling and Text Retrieval." M.1999
BOLEY, D.L." Principal direction divisive partitioning". 1998
STEINBACH, M., KARYPIS, G., and KUMAR. "A comparison of document clustering techniques". 6th ACM IDMC'07 20-21 Nov.2007
Conference, Boston, MA. V. 2000
MCLACHLAN, G. and BASFORD, "Mixture Models: Inference and Applications to Clustering." Marcel Dekker, New York, NY. K. 1988.
KAUFMAN, L. and ROUSSEEUW,. "Finding Groups in Data: An Introduction toCluster Analysis". John Wiley and Sons, New York, NY. P. 1990
NG, R. and HAN," Efficient and effective clustering methods for spatial data mining". In Proceedings of the 20th Conference on VLDB, 144-155, Santiago, Chile. J. 1994
HARTIGAN,. "Clustering Algorithms". John Wiley & Sons, New York, NY. J. 1975
PELLEG, D. and MOORE, "X-means: Extending K-means with Efficient Estimation of the Number of Clusters". In Proceedings 17th ICML, Stanford University. A. 2000.
FRALEY, C. and RAFTERY, "A. How many clusters?. Which clustering method? Answers via model-based cluster analysis". The Computer Journal, 41, 8, 578-588. 1998
HAN, J. and KAMBER, "Data Mining. Morgan Kaufmann Publishers." M. 2001.
ESTER, M., KRIEGEL, H-P., SANDER, J. and XU," A density-based algorithm for discovering clusters in large spatial databases with noise". In Proceedings of the 2nd ACM SIGKDD, 226-231, Portland, Oregon. X. 1996
SANDER, J., ESTER, M., KRIEGEL, H.-P., and XU, X. "Density-based clustering in spatial databases: the algorithm GDBSCAN and its applications. In Data Mining and Knowledge Discovery", 1998 2, 2, 169- 194.
ANKERST, M., BREUNIG, M., KRIEGEL, H.-P., and SANDER, J.. "OPTICS: Ordering points to identify clustering structure". In Proceedings of the ACM SIGMOD Conference, 49-60, Philadelphia, PA. 1999
XU, X., ESTER, M., KRIEGEL, H.-P., and SANDER, J. "A distribution- based clustering algorithm for mining large spatial datasets". In Proceedings of the 14th ICDE,324-331, Orlando, FL. 1998.
HINNEBURG, A. and KEIM,." An efficient approach to clustering large multimedia databases with noise". In Proceedings of the 4th ACM SIGKDD, 58-65, New York, NY. D. 1998
SCHIKUTA, E., ERHART, "The BANG-clustering system: grid-based data analysis". In Proceeding of Advances in Intelligent Data Analysis, Reasoning about Data, 2nd
International Symposium, 513-524, London, UK. M. 1997.
SCHIKUTA, "Grid-clustering: a fast hierarchical clustering method for very large ". E. 1996.data sets. In Proceedings 13th International Conference on Pattern Recognition, 2, 101-105.
SHEIKHOLESLAMI, G. , CHATTERJEE, S., and ZHANG,. WaveCluster: "A multiresolution clustering approach for very large spatial databases". In Proceedings of the 24 th Conference on VLDB, 428- 439, New York, NY. A. 1998
ERTOZ, L., STEINBACH, M., and KUMAR, "Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data, Technical Report". V. 2002
Rui Xu, "Survey of Clustering Algorithms", VOL. 16, NO. 3, MAY 2005 IDMC'07 20-21 Nov.2007

Data Mining Process Using Clustering: A Survey

Sign up for access to the world's latest research

Abstract

Related papers

References (42)

Related papers

Related topics