Academia.eduAcademia.edu

Outline

Improved K-MEAN Clustering Approach for Web Usage Mining

2009

https://doi.org/10.1109/ICETET.2009.125

Abstract

In the k means clustering algorithm right value of clusters (k) are initially unknown and effective selections of initial seed are also difficult. In this paper efficient k-means algorithm is proposed and implemented which overcome initial seed problem and unknown number of cluster problem. The algorithm is applied on real BIST server log data and Gaussian dataset to test its accuracy and efficiency. At application level this algorithm may used for efficient knowledge discovery from web repositories.

Key takeaways
sparkles

AI

  1. The proposed algorithm determines the optimal number of clusters (k) using a defined threshold (α).
  2. Initial points for clustering are selected based on a dissimilarity measure to ensure coverage of data.
  3. The algorithm merges clusters by comparing specific factors that measure intra-cluster similarity.
  4. Experiments validate the algorithm's effectiveness on real web log data and a Gaussian dataset of 2000 objects.
  5. The method enhances stability and consistency in clustering results compared to traditional random initialization.

References (11)

  1. 161.49.120 --[30/Jan/2008:18:36:56 - 0800] ""http://bist.in/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; ……………………………………………… Pages Frequency / 315 /index.php 184 /administrator/index2.php 56 /content/view/55/96/ 40 /content/view/370/129/ 39 /content/view/392/1/ 36 /content/view/20/44/ 36 ontent/view/393/1/ 34 /content/view/297/123/ 29 /component/option,com_ weblinks/catid,2/Itemid,69/ 26 /content/view/137/105/ 26 /content/view/1/1/ 17 References
  2. Ji He,Man Lan, Chew-Lim Tan,Sam-Yuan Sung, Hwee- BoonLow, Initialization of Cluster refinement algorithms: a review and comparative study, Proceeding of International Joint Conference on Neural Networks[C].
  3. Budapest,2004.
  4. J. M. Peña, J. A. Lozano, P. Larrañaga.An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recognition
  5. M.N. Murty, A.K. Jain, P.J. Flynn, Data clustering: a review,ACM Computer. Survey. 31 (3) (1999) 264-323.
  6. Bradley P S, Fayyad U M. Refining Initial Points for K- means, Clustering Advances in Knowledge Discovery and Data Mining, MIT Press.
  7. Ruoming Jin , Anjan Goswami and Gagan Agrawal. Fast and exact out-of-core and distributed k-means clustering. Knowledge and Information Systems, Volume 10, Number 1/July, 2006.
  8. Siddheswar Ray and Rose H. Turi, Determination of Number of Clusters in K-Means Clustering and Application in Color Image Segmentation[C].
  9. ICAPRDT'99, Calcutta, India, 27-29 December, 1999.
  10. Tapas Kanungo, David M. Mount, Nathan S. Netanyahu, Christine D. Piatko, Ruth Silverman and Angela Y. Wu, An Efficient k-Means Clustering Algorithm Analysis and Implementation.IEEE Transactions on Pattern Analysis and Machine intelligence, Vol.24, No.7, July 2002
  11. Yiu-Ming Cheung.k* -Means: A new generalized kmeans clustering algorithm. Pattern Recognition Letters 24 (2003) [9] D. Pelleg and A. Moore, "X-means: Extending kmeans with efficient estimation of the number of clusters," in ICML 2000, 200