Academia.eduAcademia.edu

Outline

An Optimised Density Based Clustering Algorithm

2010, International Journal of Computer Applications

https://doi.org/10.5120/1102-1445

Abstract

The DBSCAN [1] algorithm is a popular algorithm in Data Mining field as it has the ability to mine the noiseless arbitrary shape Clusters in an elegant way. As the original DBSCAN algorithm uses the distance measures to compute the distance between objects, it consumes so much processing time and its computation complexity comes as O (N 2). In this paper we have proposed a new algorithm to improve the performance of DBSCAN algorithm. The existing algorithms A Fast DBSCAN Algorithm[6] and Memory effect in DBSCAN algorithm[7] has been combined in the new solution to speed up the performance as well as improve the quality of the output. As the RegionQuery operation takes long time to process the objects, only few objects are considered for the expansion and the remaining missed border objects are handled differently during the cluster expansion. Eventually the performance analysis and the cluster output show that the proposed solution is better to the existing algorithms.

FAQs

sparkles

AI

What improvements does the proposed algorithm make over the traditional DBSCAN?add

The proposed algorithm reduces RegionQuery calls and includes four selected seed objects for cluster expansion, enhancing performance particularly in border areas.

How do LongRegionQuery and ShortRegionQuery functions differ in their operations?add

LongRegionQuery retrieves objects within Eps and 2*Eps distances, while ShortRegionQuery focuses exclusively on fewer objects, dramatically speeding up processing times.

What is the significance of Lemma 2 in optimizing RegionQuery calls?add

Lemma 2 demonstrates that four circles can adequately cover all immediate neighbours of a core object, improving the accuracy of cluster expansions by minimizing missed objects.

How does the proposed clustering approach handle noise in data sets?add

Non-core objects are designated as noise during processing, yet the new algorithm ensures comprehensive inclusion of border objects, reducing overlooked potential clusters.

What experimental results support the algorithm's claimed performance improvements?add

Empirical data indicates a performance increase of approximately 20% in cluster coverage without sacrificing accuracy compared to the original DBSCAN.

References (8)

  1. REFERENCES
  2. Ester M., Kriegel H.-P., Sander J., and Xu X. (1996) "A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise" In Proceedings of the 2 nd International Conference on Knowledge Discovery and Data Mining (KDD"96), Portland: Oregon, pp. 226-231
  3. J. Han and M. Kamber, Data Mining Concepts and Techniques. Morgan Kaufman, 2006.
  4. G. Karypis, E. H. Han, and V. Kumar, "CHAMELEON: A hierarchical clustering algorithm using dynamic modeling," Computer, vol. 32, no. 8, pp. 68-75, 1999.
  5. M. Ankerst, M. Breunig, H. P. Kriegel, and J. Sander, "OPTICS: Ordering Objects to Identify the Clustering Structure, Proc. ACM SIGMOD," in International Conference on Management of Data, 1999, pp. 49-60.
  6. A. Hinneburg and D. Keim, "An efficient approach to clustering in large multimedia data sets with noise," in 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 58-65.
  7. SHOU Shui-geng, ZHOU Ao-ying JIN Wen, FAN Ye and QIAN Wei-ning.(2000) "A Fast DBSCAN Algorithm" Journal of Software: 735-744.
  8. Li Jian; Yu Wei; Yan Bao-Ping; , "Memory effect in DBSCAN algorithm," Computer Science & Education, 2009. ICCSE '09. 4th International Conference on , vol., no., pp.31-36, 25-28 July 2009.