Academia.eduAcademia.edu

Outline

A review of clustering techniques and developments

2017, Neurocomputing

Abstract

This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted.

FAQs

sparkles

AI

What are key challenges faced in clustering compared to supervised classification?add

The paper indicates that clustering is inherently more complex due to the absence of pre-assigned labels, complicating the grouping process. Challenges include higher computational costs due to dimensionality and sensitivity to noise and outliers.

How does the K-means algorithm perform in comparison to Fuzzy C-means?add

The study finds that K-means clustering is generally faster with a time complexity of O(N), while Fuzzy C-means provides better handling of noise and overlapping clusters but operates with a higher complexity. For applications requiring rapid processing, K-means is preferred despite its sensitivity to outliers.

What is the computational complexity of hierarchical clustering methods?add

Hierarchical clustering typically has a computational complexity of at least O(N²), which limits its application on large-scale datasets. Enhanced methods like BIRCH and CURE reduce complexity while maintaining robustness against outliers.

How do density-based methods like DBSCAN differ from traditional K-means?add

DBSCAN effectively identifies clusters of arbitrary shapes and is resilient against noise, unlike K-means which presumes spherical clusters and struggles with outliers. It also does not require users to pre-specify the number of clusters.

What role does feature selection play in clustering algorithms?add

Feature selection is essential as it directly influences clustering performance; inappropriate features can lead to poorly defined clusters and lower accuracy. The paper highlights that feature selection methods address the curse of dimensionality and computational efficiency in clustering.

References (197)

  1. R. O. Duda, P. E. Hart, and D. G. Stork, "Pattern Classification," Wiley Publications, 2001.
  2. Y. Zhang, Y. Yin, D. Guo, X. Yu, and L. Xiao, "Cross-validation based weights and structure determination of Chebyshev-polynomial neural networks for pattern classification," Pattern Recognition, vol. 47, no. 10, pp. 3414- 3428, 2014.
  3. H. Nakayama, N. Kagaku, "Pattern classification by linear goal programming and its extensions," Journal of Global Optimization, vol. 12, no. 2, pp. 111-126, 1998.
  4. C. M. Bishop, "Pattern recognition and machine learning," Berlin: Springer, ISBN 978-0-387-31073-2.
  5. G.P. Zhang, "Neural networks for classification: a survey," IEEE Transaction on Systems, Man, and Cybernetics, Part C: Applications and Reviews, vol. 30, no. 4, pp. 451-462, 2002.
  6. H. Zhang, J. Liu, D. Ma, and Z. Wang, "Data-core-based fuzzy min-max neural network for pattern classification," IEEE Transaction on Neural Networks, vol. 22, no. 12, pp. 2339-2352, 2011.
  7. X. Jiang and A. H. K. S. Wah, "Constructing and training feed-forward neural net-works for pattern classification," Pattern Recognition, vol. 36, no. 4, pp. 853-867, 2003.
  8. G. Ou and Y. L. Murphey, "Multi-class pattern classification using neural networks," Pattern Recognition, vol. 40, no. 1, pp. 4-18. 2007.
  9. J. D. Paola and R. A. Schowengerdt, "A detailed comparison of back propagation neural network and maximum- likelihood classifiers for urban land use classification," IEEE Transaction on Geoscience and Remote Sensing, vol. 33, no. 4, pp. 981-996, 1995.
  10. D. E. Rumelhart and J. L. McClelland, "Parallel Distributed Processing," MIT Press, Cambridge, 1986.
  11. W. Zhou, "Verification of the nonparametric characteristics of back-propagation neural networks for image classification," IEEE Transaction on Geoscience and Remote Sensing, vol. 37, no. 2, pp. 771-779, 1999.
  12. G. Jaeger, U. C. Benz, "Supervised fuzzy classification of SAR data using multiple sources," IEEE International Geoscience and Remote Sensing Symposium, 1999.
  13. F. S. Marzano, D. Scaranari, and G. Vulpiani, "Supervised Fuzzy-Logic Classification of Hydrometeors Using C- Band Weather Radars," IEEE Transaction on Geoscience and Remote Sensing, vol. 45 , no. 11, pp. 3784-3799, 2007.
  14. B. Xue, M. Zhang, and W. N. Browne, "Particle Swarm Optimization for Feature Selection in Classification: A Multi-Objective Approach," IEEE Transaction on Cybernetics, vol. 43, no. 6, pp. 1656-1671, 2013.
  15. A. Saxena and M. Vora, "Novel Approach for the use of Small World Theory in Particle Swarm Optimization," 16th International Conference on Advanced Computing and Communications, 2008.
  16. Z. Pawlak, "Rough sets", International Journal of Computer and Information Science, vol. 11, no. 5, pp. 341-356. 1982.
  17. Z. Pawlak, "Rough sets In Theoretical Aspects of Reasoning about Data," Kluwer, Netherlands, 1991.
  18. S. Dalai, B. Chatterjee, D. Dey, S. Chakravorti, and K. Bhattacharya, "Rough-Set-Based Feature Selection and Classification for Power Quality Sensing Device Employing Correlation Techniques," IEEE Sensors Journal, vol. 13, no. 2, pp. 563-573, 2013
  19. J. R. Quinlan, "Induction of decision trees," Machine Learning, vol. 1, no. 1, pp. 81-106, 1986.
  20. D. M. Farida, L Zhang, C. M. Rahman, M. A. Hossain, and R. Strachan, "Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks," Expert Systems with Applications, vol. 41, no. 2, pp. 1937-1946, 2014.
  21. J. Han, M. Kamber, and J. Pei, "Data Mining: Concepts and Techniques," Morgan Kaufmann Publishers, 2011.
  22. L. Rokach, "Clustering Methods," Data Mining and Knowledge Discovery Handbook, pp 331-352, Springer 2005.
  23. A. Saxena, N. R. Pal, and M. Vora, "Evolutionary methods for unsupervised feature selection using Sammon's stress function, Fuzzy Information and Engineering," vol. 2, no. 3, pp. 229-247, 2010.
  24. A. K. Jain, "Data Clustering: 50 years beyond k-means," Pattern Recognition Letters, vol. 31, no. 8, pp. 651-666, 2010.
  25. Merriam-Webster Online Dictionary, 2008
  26. V. E. Castro and J. Yang, "A Fast and robust general purpose clustering algorithm," International Conference on Artificial Intelligence, 2000.
  27. C. Fraley and A. E. Raftery, "How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis", Technical Report No. 329, Department of Statistics University of Washington, 1998.
  28. A. K. Jain, M. N. Murty, and P. J. Flynn, "Data Clustering: A review. ACM Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
  29. P. Sneath and R. Sokal, "Numerical Taxonomy," W.H. Freeman Co, San Francisco, CA, 1973.
  30. B. King, "Step-wise Clustering Procedures," Journal of American Statistical Association , vol. 69, no. 317, pp. 86- 101, 1967.
  31. J. H. Ward, "Hierarchical grouping to optimize an objective function," Journal of the American Statistical Association, vol. 58, no. 301, pp. 236-244, 1963.
  32. F. Murtagh, "A survey of recent advances in hierarchical clustering algorithms which use cluster centers," Computer Journal, vol. 26, no. 4, pp. 354-359, 1984.
  33. A. Nagpal, A. Jatain, and D. Gaur, "Review based on Data Clustering Algorithms," IEEE Conference on Information and Communication Technologies, 2013.
  34. A. Periklis, "Data Clustering Techniques," University of Toronto, 2002.
  35. S. Guha, R. Rastogi, and S. Kyuseok, "CURE: An efficient clustering algorithm for large databases," ACM, 1998.
  36. K. George, E. H. Han, and V. Kumar, "CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling," IEEE Computer, vol. 32, no. 8, pp. 68-75, 1999.
  37. D. Lam and D. C. Wunsch, "Clustering," Academic Press Library in Signal Processing," Signal Processing Theory and Machine Learning, vol. 1, 2014
  38. J. B. MacQueen, "Some Methods for classification and Analysis of Multivariate Observations," 5 th Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, vol. 1, pp. 281-297, 1967.
  39. A. Gersho and R. Gray, "Vector Quantization and Signal Compression," Kluwer Academic Publishers, 1992.
  40. J. C. Dunn, "A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters," Journal of Cybernetics, vol. 3, no. 3, pp. 32-57, 1973.
  41. J. C. Bezdek, "Pattern Recognition with Fuzzy Objective Function Algorithms," Plenum Press, New York, 1981.
  42. R. Yager and D. Filev, "Approximate clustering via the mountain method," IEEE Transaction on Systems, Man and Cybernetics, Part B: Cybernetics, vol. 24, no. 8, pp. 1279-1284. 1994
  43. I. Gath and A. Geva, "Unsupervised optimal fuzzy clustering," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 11, no. 7, pp. 773-781. 1989.
  44. R. Hathaway, J. Bezdek, and Y. Hu, "Generalized fuzzy c-Means clustering strategies using Lp norm distances," IEEE Transaction on Fuzzy Systems, vol. 8, no. 5, pp. 576-582. 2000.
  45. R. Krishnapuram and J. Keller, "A possibilistic approach to clustering," IEEE Transaction on Fuzzy Systems, vol. 1, no. 2, pp. 98-110, 1993.
  46. C. T. Zahn, "Graph-theoretical methods for detecting and describing gestalt clusters," IEEE Transaction on Computer, vol. C-20, no. 1, pp. 68-86, 1971.
  47. R. Urquhart, "Graph-theoretical clustering based on limited neighborhood sets," Pattern Recognition, vol. 15, no. 3, pp. 173-187, 1982.
  48. D. H. Fisher, "Knowledge acquisition via incremental conceptual clustering," Machine Learning 2, pp. 139-172. 1987.
  49. S. Haykin, "Neural Networks: A Comprehensive Foundation," 2 nd Edition, Prentice Hall, 1999.
  50. R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Transaction on Neural Networks, vol. 16, no. 3, 645-678, 2005.
  51. R. Xu. D.C. Wunsch, "Clustering algorithms in biomedical research: a review," IEEE Reviews in Biomedical Engineering, vol. 3, pp. 120-154. 2010.
  52. G. McLachlan, T. Krishnan, "The EM Algorithm and Extensions," Wiley, New York, 1997.
  53. J. D. Banfield and A. E. Raftery, "Model-based Gaussian and non-Gaussian clustering Biometrics," vol. 49, no. 3, pp. 803-821, 1993.
  54. M. Ester, H. P. Kriegel, S. Sander S, and X. Xu, "A density-based algorithm for discovering clusters in large spatial databases with noise," 2 nd International Conference on Knowledge Discovery and Data Mining, 1996.
  55. P. Cheeseman, J. Stutz, "Bayesian Classification (AutoClass): Theory and Results," Advances in Knowledge Discovery and Data Mining, pp. 153-180, 1996.
  56. C. S. Wallace and D. L. Dowe, "Intrinsic classification by mml-the snob program," 7 th Australian Joint Conference on Artificial Intelligence, pp. 37-44, 1994.
  57. W. Wang, J. Yang, and R. R. Muntz, "STING: A Statistical Information Grid Approach to Spatial Data Mining," 23 rd VLDB Conference, pp. 86-195, 1997.
  58. G. Sheikholeslami, S. Chatterjee and A. Zhang, "WaveCluster: a wavelet-based clustering approach for spatial data in very large databases," The International Journal on Very Large Data Bases, vol. 8, no. 3-4, pp. 289-304, 2000.
  59. R. Agrawal, G. Johannes, G. Dimitrios, and P. Raghavan, "Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications," SIGMOD Conference, pp. 94-105, 1998.
  60. A. K. Jain and M. Flynn, "Data clustering: a review," ACM Computing Surveys (CSUR), vol. 31, no. 3, pp. 264- 323, 1999.
  61. H. P. Schwefel, "Numerical Optimization of Computer Models," John Wiley, New York, 1981.
  62. L. J. Fogel, A. J. Owens, and M J Walsh, "Artificial Intelligence Through Simulated Evolution," John Wiley , New York, 1965.
  63. J. H. Holland, "Adaption in Natural and Artificial Systems," University of Michigan Press, 1975.
  64. D. Goldberg, "Genetic Algorithms in Search Optimization and Machine Learning," Addison Wesley Reading, 1989.
  65. J. Kennedy and R. C. Eberhart, "Swarm Intelligence," Morgan Kaufmann, 2001.
  66. J. Kennedy and R. Eberhart, "Particle Swarm Optimization," 4 th IEEE International Conference on Neural Networks. pp. 1942-1948, 1995.
  67. M. Dorigoand T. Stützle, "Ant Colony Optimization," MIT Press, 2004.
  68. F. Glover, "Future Paths for Integer Programming and Links to Artificial Intelligence," Computers and Operations Research, vol. 5, no. 5, pp. 533-549, 1986.
  69. K. S. Al. Sultan, "A Tabu Search Approach to Clustering Problem," Pattern Recognition, vol. 28, no. 9, pp. 1443- 1451, 1995.
  70. W. Pedrycz, "Collaborative fuzzy clustering," Pattern Recognition Letters, vol. 23, no. 14, pp. 1675-1686, 2002.
  71. L. F. S. Coletta, L. Vendramin, E. R. Hruschka, R. J. G. B. Campello, and W. Pedrycz, "Collaborative Fuzzy Clustering Algorithms: Some Refinements and Design Guidelines," IEEE Transactions on Fuzzy Systems, vol. 20, no. 3, pp. 444-462, 2012.
  72. W. Pedrycz and P. Rai, "Collaborative clustering with the use of Fuzzy C-Means and its quantification," Fuzzy Sets and Systems, vol. 159, no. 18, pp. 2399-2427, 2008.
  73. W. Pedrycz, "Knowledge Based Clustering: From data to information granules," Wiley Publications, 2005.
  74. M. Prasad, C. T. Lin, C. T. Yang, and A. Saxena, "Vertical Collaborative Fuzzy C-Means for Multiple EEG Data Sets," Springer Intelligent Robotics and Applications Lecture Notes in Computer Science, vol. 8102, pp 246-257, 2013.
  75. C. Pizzuti, "Overlapping Community Detection in Complex Networks," GECCO, pp. 859-866, 2009.
  76. S. Gregory, "A Fast Algorithm to Find Overlapping Communities in Networks," PKDD, pp. 408-423, 2008.
  77. Y. Y. Ahn, J. P. Bagrow, and S. Lehmann, "Link communities reveal multi-scale complexity in networks," Nature, vol. 466, pp. 761-764, 2010.
  78. G Forestier, P Gancarski, and C Wemmert, "Collaborative Clustering with back ground knowledge," Data and Knowledge Engineering, vol. 69, no. 2, pp. 211-228, 2010.
  79. J. Handl and J. Knowles, "An evolutionary approach to Multiobjective clustering," IEEE Transaction on Evolutionary Computation, vol.11, no. 1, pp. 56-76, 2007.
  80. A. Konak, D. Coit, and A. Smith, "Multiobjective optimization using genetic algorithms: A tutorial," Reliability Engineering and System Safety, vol. 91, no. 9, pp. 992-1007, 2006.
  81. K. Faceili, A. D. Carvalho, and D. Souto, "Multiobjective Clustering ensemble," International Conference, on Hybrid Intelligent Systems, 2006.
  82. M. K. Law, A. Topchy, and A. K. Jain, "Multiobjective Data Clustering," IEEE Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 424-430, 2004.
  83. D. Forsyth and J. Ponce, "Computer vision: a modern approach," Prentice Hall, 2002.
  84. I. H. G. S. Consortium, "Initial sequencing and analysis of the human genome," Nature, vol. 409, pp. 860-921, 2001.
  85. C. Dorai and A. K. Jain, "Shape Spectra Based View Grouping for Free Form Object," International Conference on Image Processing, vol. 3, pp. 240-243, 1995.
  86. S. Connell and A. K. Jain, "Learning Prototypes for On-Line Handwritten Digits," 14 th International Conference on Pattern Recognition, vol. 1, pp. 182-184, 1998.
  87. E. Rasmussen, "Clustering Algorithms," Information Retrieval: Data Structures and Algorithms, Prentice Hall Englewood Cliffs, pp 419-442, 1992.
  88. G. McKiernan, "LC Classification Outline," Library of Congress Washington, D. C, 1990.
  89. S. R. Hedberg, "Searching for the mother lode: Tales of the first data miners," IEEE Expert: Intelligent Systems an Their Applications, vol. 11, no. 5, pp. 4-7, 1996.
  90. J. Cohen, "Communications of the ACM: Data Mining Association for Computing Machinery," Nov. 1996.
  91. A. Saxena, J. Wang, "Dimensionality Reduction with Unsupervised Feature Selection and Applying Non- Euclidean Norms for Classification Accuracy," International Journal of Data Warehousing and Mining, vol. 6, no. 2, pp 22-40, 2010.
  92. K. S. Al. Sultan and M. M. Khan, "Computational experience on four algorithms for the hard clustering problem," Pattern Recognition Letters, vol. 17, no. 3, pp. 295-308, 1996.
  93. R. Michalski, R. E. Stepp, and E. Diday, "Automated construction of classifications: conceptual clustering versus numerical taxonomy," IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 5, no. 4, pp. 396-409, 1983.
  94. J. C. Venter et. al.,"The sequence of the human genome,"Science,vol. 291, pp. 1304-1351, 2001.
  95. J. L. Kolodner, "Reconstructive memory: A computer model," Cognitive Science, vol. 7, no. 4, pp. 281-328, 1983.
  96. C. Carpineto and G. Romano, "An order-theoretic approach to conceptual clustering," 10 th International Conference on Machine Learning, pp. 33-40, 1993.
  97. L. Talavera and J. Bejar. "Generality-Based Conceptual Clustering with Probabilistic Concepts," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 2, pp. 196-206, 2001.
  98. M. Hadzikadic and D. Yun, "Concept formation by incremental conceptual clustering," 11 th International Joint Conference Artificial Intelligence, pp. 831-836, 1989.
  99. G. Biswas, J. B. Weinberg, and D. H. Fisher, "Iterate: A conceptual clustering algorithm for data mining," IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 28, no. 2, pp. 219-230, 1998.
  100. K. Thompson and P. Langley, "Concept formation in structured domains," Concept Formation: Knowledge and Experience in Unsupervised Learning, Morgan Kaufmann, 1991.
  101. I. Jonyer, D. Cook, and L. Holder, "Graph-based hierarchical conceptual clustering," Journal of Machine Learning Research, vol. 2, pp. 19-43, 2001.
  102. M. Lebowitz, "Experiments with Incremental Concept Formation: UNIMEM," Machine Learning, vol. 2, no. 2, pp. 103-138, 1987.
  103. S. Hanson and M. Bauer, "Conceptual clustering, categorization and polymorphy," Machine Learning Journal, vol. 3, no. 4, pp. 343-372, 1989.
  104. T. Kohonen, "The self-organizing map," Neurocomputing, vol. 21, no. 1-3, Pages 1-6, 1998.
  105. J. Vesanto and E. Alhoniemi, "Clustering of the Self-Organizing Map," IEEE Transactions on Neural Networks, vol. 11, no. 3, 2000.
  106. J. G. Upton and B. Fingelton, "Spatial Data Analysis by Example," Point Pattern and Quantitative Data, John Wiley & Sons, New York, vol. 1, 1985.
  107. A. Strehl, J. Ghosh, and R. Mooney, "Impact of similarity measures on web-page clustering," Workshop on Artificial Intelligence for Web Search, pp 58-64, 2000.
  108. J. J. Fortier, and H. Solomon, "Clustering procedures," The Multivariate Analysis, pp. 493-506, 1996.
  109. M. A. Gluck and J. E. Corter,(1985), "Information, uncertainty, and the utility of categories," Program of the 7th Annual Conference of the Cognitive Science Society, pp. 283-287, 1985.
  110. M. J. A. N. Condorcet, "Essai sur l'Application de l'Analyse `a la Probabilite´ des decisions rendues a la Pluralite´ des Voix," paris: L'Imprimerie Royale, 1785.
  111. J. F. Marcotorchino and P. Michaud, "Optimisation en Analyse Ordinale des Donnees Masson, Paris, 1979.
  112. J. E. Corter and M. A. Gluck, "Explaining basic categories: Feature predictability and information," Psychological Bulletin, vol. 111, no. 2, pp. 291-303, 1992.
  113. A. Strehl and J. Ghosh, "Clustering Guidance and Quality Evaluation Using Relationship-based Visualization," Intelligent Engineering Systems through Artificial Neural Networks, St. Louis, Missouri, USA, pp 483-488, 2000.
  114. S. V. Stehman, "Selecting and interpreting measures of thematic classification accuracy" Remote Sensing of Environment, vol. 62, no. 1, pp. 77-89, 1997.
  115. W. M. Rand, "Objective criteria for the evaluation of clustering methods," Journal of the American Statistical Association, vol. 66, no. 336, pp. 846-850, 1971.
  116. V. Rijsbergen, "Information retrieval," Butterworths, London, 1979.
  117. J. F. Brendan and D. Dueck, "Clustering by passing messages between data points,".Science, vol. 315, pp. 972- 976, 2007.
  118. E. B. Fowlkes and C. L. Mallows (1983), "A Method for Comparing Two Hierarchical Clusterings," Journal of the American Statistical Association, vol. 78, no. 383, pp. 553-569, 2010.
  119. D. L. Olson and D. Delen, "Advanced Data Mining Techniques," Springer, 1 st edition, 2008.
  120. D. M. W. Powers, "Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37-63, 2007.
  121. P. Jaccard, "Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines," Bulletin de la Société Vaudoise des Sciences Naturelles, vol. 37, pp. 241-272, 1901.
  122. J. Han, M. Kamber, and J. Pei, "Data mining: Concepts and techniques," Morgan Kaufman, San Francisco, USA, 2011.
  123. J. J. Grefenstette, "Optimization of Control Parameters for Genetic Algorithms," IEEE Transaction on Systems, Man and Cybernetics, vol. 16, no. 1, pp. 122-128, 1986.
  124. C. T. Lin, M. Prasad, and J. Y. Chang, "Designing mamdani type fuzzy rule using a collaborative FCM scheme," International Conference on Fuzzy Theory and Its Applications, 2013.
  125. L. Eugene, "Chapter 4.5. Combinatorial Implications of Max-Flow Min-Cut Theorem, Chapter 4.6. Linear Programming Interpretation of Max-Flow Min-Cut Theorem," Combinatorial Optimization: Networks and Matroids, Dover. pp. 117-120, 2001.
  126. C. H. Papadimitriou and K. Steiglitz, "Chapter 6.1 The Max-Flow, Min-Cut Theorem," Combinatorial Optimization: Algorithms and Complexity. Dover. pp. 120-128, 1998.
  127. A. S. Fotheringham, M. E. Charlton, and C. Brunsdon, "Geographically weighted regression: a natural evolution of the expansion method for spatial data analysis," Environment and Planning, vol. 30, no. 11, pp. 1905-1927, 1998.
  128. M. Honarkhah, and J. Caers, "Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling," Mathematical Geosciences, vol. 42, no. 5, pp. 487-517, 2010.
  129. P. Tahmasebi, A. Hezarkhani, and M Sahimi, "Multiple-point geostatistical modeling based on the cross- correlation functions," Computational Geosciences, vol.16, no. 3, pp. 779-797, 2012.
  130. S. Guha, R. Rastogi, and K. Shim, "ROCK: A Robust Clustering Algorithm for Categorical Attributes," IEEE Conference on Data Engineering, 1999.
  131. T. Zhang, R. Ramakrishnan, and M. Linvy, "BIRCH: An Efficient Method for Very Large Databases," ACM SIGMOD, 1996.
  132. D. Jiang, G. Chen, B. C. Ooi, K. L. Tan, and S. W, "epiC: an Extensible and Scalable System for Processing Big Data," 40th VLDB Conference, pp. 541 -552, 2014.
  133. Z. Huang, "A Fast Clustering Algorithm to Cluster very Large Categorical Data Sets in Data Mining," DMKD, 1997.
  134. A. Hinneburg and D. Keim, "An Efficient Approach to Clustering in Large Multimedia Databases with Noise," KDD Conference, 1998.
  135. M. J. A. Berry and G. Linoff, "Data Mining Techniques For Marketing, Sales and Customer Support," John Wiley & Sons, Inc., USA, 1996.
  136. G. Fennell, G. M. Allenby, S. Yang and Y. Edwards, "The Effectiveness of Demographics and Phychographic Variables for Explaining Brand and Product Category Use," Quantitative Marketing and Economics, vol. 1, no. 2, pp. 223-224, 2003.
  137. M. Y. Kiang, D. M. Fisher, M. Y. Hu, "The effect of sample size on the extended self-organizing map network-A market segmentation application," Computational Statistics and Data Analysis, vol. 51, no. 12, pp. 5940-5948, 2007.
  138. S. Dolnicar, "Using Cluster Analysis for Market Segmentation-Typical Misconceptions, Established Methodological Weaknesses and Some Recommendations for Improvement," Journal of Marketing Research, vol. 11, no. 2, pp. 5-12, 2003.
  139. R. Wagner, S. W. Scholz, and R. Decker, "The number of clusters in market segmentation," Data Analysis and Decision Support, Heidelberg: Springer, pp. 157-176, 2005.
  140. R. M. Durbin, S. R. Eddy, A. Krogh, and G. Mitchison, "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids," Cambridge: Cambridge University Press, 1998.
  141. J. M. Kaplan,el, R. G. Winther, "Prisoners of Abstraction? The Theory and Measure of Genetic Variation, and the Very Concept of "Race"," Biological Theory, vol. 7. 2012.
  142. P. J. Carrington, and J. Scott, "Social Network Analysis: An Introduction," The Sage Handbook of Social Network Analysis, London, vol. 1, 2011.
  143. "Yippy growing by leaps, bounds," The News-Press. 23 May 2010, Retrieved 24 May 2010.
  144. D. Dirk, "A concept-oriented approach to support software maintenance and reuse activities" 5th Joint Conference on Knowledge Based Software Engineering, 2002.
  145. M. G. B. Dias, N. Anquetil, and K. M. D. Oliveira, "Organizing the knowledge used in software maintenance," Journal of Universal Computer Science, vol. 9, no. 7, pp. 641-658, 2003.
  146. R. Francesco, L Rokach and B. Shapira, "Introduction to Recommender Systems Handbook," Recommender Systems Handbook, Springer, 2011, pp. 1-35.
  147. R. Baker, "Data Mining for Education," International Encyclopedia of Education (3rd edition), Oxford, UK, Elsevier, vol. 7, pp. 112-118, 2010.
  148. G. Siemens, R. S. J. D. Baker, "Learning analytics and educational data mining: towards communication and collaboration," 2nd International Conference on Learning Analytics and Knowledge, pp. 252-254, 2012.
  149. R. Huth, C. Beck, A. Philipp, M. Demuzere, Z. Ustrnul, M. Cahynova, J. Kysely, and O. E. Tveito, "Classifications of Atmospheric Circulation Patterns: Recent Advances and Applications" Annals of the New York Academy Science, vol. 1146, no. 1, pp. 105-152, 2008.
  150. A. Bewley. R. Shekhar, S. Leonard, B. Upcroft, and P. Lever, "Real-time volume estimation of a dragline payload," IEEE International Conference on Robotics and Automation", pp. 1571-1576, 2011.
  151. C. D. Manning, P. Raghavan, and H. Schu¨tze, "An Introduction to Information Retrieval," Cambridge University, Press, 2009.
  152. D. T. Nguyen, L. Chen, and C. K. Chan, "Clustering with Multi-viewpoint-Based Similarity Measure," IEEE Transactions on Knowledge and Data Engineering, vol. 24, no. 6, pp. 988-1001, 2012.
  153. Bravais, "Memoires par divers savants," T, IX, Paris, pp. 255-332, 1846.
  154. K. Pearson, "Mathematical Contributions to the Theory of Evolution, III, Regression, Heredity, and Panmixia," Philosophical Transactions of the Royal Society of London, Series A, vol. 187, pp. 253-318, 1896.
  155. T. Sørensen, "A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons," Kongelige Danske Videnskabernes Selskab, vol. 5, no. 4, pp. 1-34, 1948.
  156. L. R. Dice, "Measures of the Amount of Ecologic Association Between Species," Ecology, vol. 26, no. 3, pp. 297-302, 1945.
  157. J. D. Hamilton, "Time Series Analysis," Princeton University Press, 1994.
  158. R. S. Tsay, "Analysis of Financial Time Series," John Wiley & SONS, 2005.
  159. A Saxena and J. Wang, "Dimensionality Reduction with Unsupervised Feature Selection and Applying Non- Euclidean Norms for Classification Accuracy," International Journal of Data Warehousing and Mining (IJDWM), vol. 6, no. 2, pp. 22-40, 2010.
  160. S. Arora, I. Chana, "A Survey of Clustering Techniques for Big Data Analysis," 5th International Conference on The Next Generation Information Technology Summit (Confluence), 2014.
  161. A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, and T. Herawan, "Big Data Clustering: A Review," Lecture Notes in Computer Science, vol. 8583, pp. 707-720, 2014.
  162. H. Wang, W. Wang, J. Yang, and P. S. Yu, "Clustering by Pattern Similarity in Large Data Sets," International Conference on Management of Data, ACM, 2002.
  163. Z. Huang, "A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining," DMKD. 1997.
  164. X. Wu, X. Zhu, G. Q. Wu, and W. Ding, "Data mining with big data," IEEE Transaction on Knowledge and Data Engineering, vol. 26, no. 1, pp. 97-107, 2014.
  165. P. Russom, "Big Data Analytics," TDWI Best Practices Report, Fourth Quarter, 2011.
  166. C. Xiao, F. Nie, and H. Huang, "Multi-view k-means clustering on big data," The Twenty-Third International Joint Conference on Artificial Intelligence, AAAI, 2013.
  167. W. Fan and B. Albert, "Mining Big Data: Current Status and Forecast to the Future," ACM SIGKDD Explorations Newsletter, vol. 14, no. 2, pp. 1-5, 2013.
  168. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, "The hadoop distributed file system," IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), 2010.
  169. D. Jeffrey and S. Ghemawat, "MapReduce: a flexible data processing tool," Communications of the ACM, vol. 53, no. 1, pp. 72-77, 2010.
  170. G. Celeux, and G. Govaert, "A classification EM algorithm for clustering and two stochastic versions," Computational statistics & Data analysis, vol. 14, no. 3, pp. 315-332, 1992.
  171. L. Kaufman and P. Rousseeuw, "Finding Groups in Data: An Introduction to Cluster Analysis," Wiley, 1990.
  172. R. Ngand and J. Han, "CLARANS: A method for clustering objects for spatial data mining," IEEE Trans. Knowledge Data Engineering, vol. 14, no. 5, pp. 1003-1016, 2002.
  173. Sisodia, singh, sisodia, and saxena, "Clustering Techniques: A Brief Survey of Different Clustering Algorithms", International Journal of Latest Trends in Engineering and Technology (IJLTET), vol. 1, no. 3, pp. 82-87, 2012.
  174. Zhong, Miao, and Wang, "A graph-theoretical clustering method based on two rounds of minimum spanning trees," Pattern Recognition, vol. 43, pp. 752 -766, 2010.
  175. Y. Chen, S. Sanghavi, and H. Xu, "Improved graph clustering," IEEE Transactions on Information Theory, vol. 60, no. 10, pp. 6440-6455, 2014.
  176. A. Condon, and R. Karp, "Algorithms for graph partitioning on the planted partition model," Random Structures Algorithms, vol. 18, no. 2, pp. 116-140, 2001.
  177. W. E. Donath and A. J. Hoffman, "Lower bounds for the partitioning of graphs," IBM J. Res. Develop., vol. 17, pp. 420 -425, 1973.
  178. J. Shi, J. and J. Malik, "Normalized cuts and image segmentation," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 888 -905, 2000.
  179. U. Luxburg, "A tutorial on spectral clustering," Statistics and Computing, vol. 17, no. 4, pp. 395-416, 2007.
  180. K. Rohe, S. Chatterjee, and B. Yu, "Spectral clustering and the high-dimensional stochastic block model," The Annals of Statistics, vol. 39, no. 4, pp. 1878-1915, 2011.
  181. S. Gunnemann, I. Farber, B. Boden, and T. Seidl, "Subspace clustering meets dense sub-graph mining," A synthesis of two paradigms, In ICDM, 2010.
  182. K. Macropol and A. Singh, "Scalable discovery of best clusters on large graphs," Proceedings of the VLDB Endowment, vol. 3, no. 1-2, pp. 693-702, 2010.
  183. J. J. Whang, X. Sui, and I. S. Dhillon, "Scalable and memory-efficient clustering of large-scale social networks," In ICDM, 2012.
  184. G. Karypis and V. Kumar, "A fast and high quality multilevel scheme for partitioning irregular graphs," SIAM Journal on Scientific Computing, vol. 20, no. 1, pp. 359-392, 1998.
  185. G. Karypis and V. Kumar, "Multilevel k-way partitioning scheme for irregular graphs," Journal of Parallel and Distributed Computing, vol. 48, pp. 96-129, 1998.
  186. D. Yan, L. Huang, and M. I. Jordan, "Fast approximate spectral clustering," In KDD, pp. 907-916, 2009.
  187. J. Liu, C. Wang, M. Danilevsky, and J. Han, "Large-scale spectral clustering on graphs," In IJCAI, 2013.
  188. W. Yang and H. Xu, "A divide and conquer framework for distributed graph clustering," In ICML, 2015.
  189. Ghosh and Dubey, "Comparative Analysis of K-Means and Fuzzy C Means Algorithms," International Journal of Advanced Computer Science and Applications, vol. 4, no.4, pp. 35-39, 2013.
  190. S. Niwattanakul, J. Singthongchai, E. Naenudorn and S. Wanapu, "Using of Jaccard Coefficient for Keywords Similarity", Proceedings of the International MultiConference of Engineers and Computer Scientists 2013 Vol I, IMECS 2013, March 13 -15, 2013, Hong Kong, 1-5.
  191. C. Chen, L. Pau, and P. Wang, "Hand book of Pattern Recognition and Computer Vision , Eds., World Scientific, Singapore, pp. 3 -32. R.Dubes, "Cluster analysis and related issue".
  192. A. Jain and R. Dubes, "Algorithms for Clustering Data," Englewood, Cliffs, NJ: Prentice-Hall, 1988.
  193. C. Shi, Y. Cai, D. Fu, Y. Dong, and B. Wu, "A link clustering based overlapping community detection algorithm," Data & Knowledge Engineering, vol. 87, pp. 394-404, 2013.
  194. G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, "Uncovering the overlapping community structure of complex networks in nature and society," Nature, vol. 435, pp. 814-818, 2005.
  195. D. H. Wolpert and W. G. Macready, "No Free Lunch Theorem for Optimization," IEEE Transactions on Evolutionary Computation, vol. 1, No. 1, pp. 67-82, 1997
  196. Bensmail, H., Celeux, G., Raftery, A. E. and Robert, C. P. (1997) Inference in model-based cluster analysis. Stat.Comput., 7, 1-10.
  197. Xu.D., Tian, Y., "A Comprehensive Survey o f Clustering Algorithms", Ann. Data Sci. 2, 165-193,2015.