ADCN: An Anisotropic Density-Based Clustering Algorithm
https://doi.org/10.1145/2996913.2996940…
4 pages
1 file
Sign up for access to the world's latest research
Abstract
In this work we introduce an anisotropic density-based clustering algorithm. It outperforms DBSCAN and OPTICS for the detection of anisotropic spatial point patterns and performs equally well in cases that do not explicitly benefit from an anisotropic perspective. ADCN has the same time complexity as DBSCAN and OPTICS, namely O(n log n) when using a spatial index, O(n 2 ) otherwise.




Related papers
2006
Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN [Ester, M. et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In E. Simoudis, J. Han, & U. M. Fayyad (Eds.), Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 226-231). Portland, OR: AAAI.] using a global density parameter is not suitable. As an improvement, OPTICS [Ankerst, M. et al,(1999). OPTICS: Ordering Points To Identify the Clustering Structure. In A. Delis, C. Faloutsos, & S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 49-60). Philadelphia, PA: ACM.] creates an augmented ordering of the database representing its density-based clustering structure, but it only generates the clusters whose local-density exceeds some threshold instead of similar local-density clusters and doesn't produce a clustering of a data set explicitly. Furthermore the parameters required by almost all the well-known clustering algorithms are hard to determine but have a significant influence on the clustering result. In this paper, a new clustering algorithm LDBSCAN relying on a local-density-based notion of clusters is proposed to solve those problems and, what is more, it is very easy for us to pick the appropriate parameters and takes the advantage of the LOF [Breunig, M. M., et al.,(2000). LOF: Identifying Density-Based Local Outliers. In W. Chen, J. F. Naughton, & P. A. Bernstein (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 93-104). Dalles, TX: ACM.] to detect the noises comparing with other density-based clustering algorithms. The proposed algorithm has potential applications in business intelligence and enterprise information systems.
This thesis is concerned with efficient density-based clustering using algorithms such as DBSCAN and NBC as well as the application of indices and the property of triangle inequality in order to make these algorithms faster.
As a research branch of data mining, clustering, as an unsupervised learning scheme, focuses on assigning objects in the dataset into several groups, called clusters, without any prior knowledge. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most widely used clustering algorithms for spatial datasets, which can detect any shapes of clusters and can automatically identify noise points. However, there are several troublesome limitations of DBSCAN: (1) the performance of the algorithm depends on two specified parameters, ε and MinPts in which ε represents the maximum radius of a neighborhood from the observing point and MinPts means the minimum number of data points contained in such a neighborhood. (2) The time consumption for searching the nearest neighbors of each object is intolerable in the cluster expansion. (3) Selecting different starting points results in quite different consequences. (4) DBSCAN is unable to identify adjacent clusters of various densities. In addition to these restrictions about DBSCAN mentioned above, the identification of border points is often ignored. In our paper, we successfully solve the above problems. Firstly, we improve the traditional locality sensitive hashing method to implement fast query of nearest neighbors. Secondly, several definitions are redefined on the basis of the influence space of each object, which takes the nearest neighbors and the reverse nearest neighbors into account. The influence space is proved to be sensitive to local density changes to successfully reduce the amount of parameters and identify adjacent clusters of different densities. Moreover, this new relationship based on the influence space makes the insensitivity to the ordering of inputting points possible. Finally, a new concept—core density reachable based on the influence space is put forward which aims to distinguish between border objects and noisy objects. Several experiments are performed which demonstrate that the performance of our proposed algorithm is better than the traditional DBSCAN algorithm and the improved algorithm IS-DBSCAN.
International Journal of Computer Applications, 2010
The DBSCAN [1] algorithm is a popular algorithm in Data Mining field as it has the ability to mine the noiseless arbitrary shape Clusters in an elegant way. As the original DBSCAN algorithm uses the distance measures to compute the distance between objects, it consumes so much processing time and its computation complexity comes as O (N 2). In this paper we have proposed a new algorithm to improve the performance of DBSCAN algorithm. The existing algorithms A Fast DBSCAN Algorithm[6] and Memory effect in DBSCAN algorithm[7] has been combined in the new solution to speed up the performance as well as improve the quality of the output. As the RegionQuery operation takes long time to process the objects, only few objects are considered for the expansion and the remaining missed border objects are handled differently during the cluster expansion. Eventually the performance analysis and the cluster output show that the proposed solution is better to the existing algorithms.
2006 3rd International Symposium on Voronoi Diagrams in Science and Engineering, 2006
In this paper, we present a fast O(nlogn) clustering algorithm based on Delaunay Triangulation for identifying clusters of different shapes, not necessarily convex. The clustering result is similar to human perception of clusters. The novelty of our method is the growth model we follow in the cluster formation that resembles the natural growth of a crystal. Our algorithm is able to identify dense as well as sparse clusters and also clusters connected by bridges. We demonstrate clustering results on several synthetic datasets and provide a comparison with popular K-Means based clustering methods. The clustering is based purely on proximity analysis in the Delaunay Triangulation and avoids usage of global parameters. It is robust in the presence of noise. Finally, we demonstrate the capability of our clustering algorithm in handling very large datasets.
2013
Density based clustering algorithm is one of the primary methods for clustering in data mining. The clusters which are formed based on the density are easy to understand and it does not limit itself to the shapes of clusters. One of them is DBSCAN which is a well known DENSITY-based clustering algorithm used for mining of unsupervised data. The DBSCAN algorithm suffers from several deficiencies whenever the database size is large. Also, DBSCAN does not respond well to data sets with varying densities. For this reason its complexity in worst case becomes O(n 2). The PROPOSED novel algorithm NDCMD (A Unified Novel Density Based Clustering Using Multidimensional Spatial Data): this outperforms DBSCAN for varying density. This is motivated by the current state-of-the-art density clustering algorithm DBSCAN. Ultimately we show how to automatically and capably extract not only 'traditional' clustering information, such as representative points, but also the fundamental clustering structure. Extensive experiments on some synthetic datasets show the validity of the proposed algorithm.
Neural Computing and Applications, 2016
Organizing data into sensible groups is called as 'data clustering.' It is an open research problem in various scientific fields. Neither a universal solution nor an absolute strategy for its evaluation exists in the literature. In this context, through this paper, we make following three contributions: (1) A new method for finding 'natural groupings' or clusters in the data set is presented. For this, a new term 'vicinity' is coined. Vicinity captures the idea of density together with spatial distribution of data points in feature space. This new notion has a potential to separate various type of clusters. In summary, the approach presented here is non-convex admissive (i.e., convex hulls of the clusters found can intersect which is desirable for non-convex clusters), cluster proportion and omission admissive (i.e., duplicating a cluster arbitrary number of times or deleting a cluster does not alter other cluster's boundaries), scale covariant, consistent (shrinking within cluster distances and enlarging inter-cluster distances does not affect the clustering results) but not rich (does not generates exhaustive partitions of the data) and density invariant. (2) Strategy for automatic detection of various tunable parameters in the proposed 'Vicinity Based Cluster Detection' (VBCD) algorithm is presented. (3) New internal evaluation index called 'Space-Density Index' (SDI) for the clustered results (by any method) is also presented. Experimental results reveal that VBCD captures the idea of 'natural groupings' better than the existing approaches. Also, SDI evaluation scheme provides a better judgment as compared to earlier internal cluster validity indices.
2007
Density based clustering methods allow the identification of arbitrary, not necessarily convex regions of data points that are densely populated. The number of clusters does not need to be specified beforehand; a cluster is defined to be a connected region that exceeds a given density threshold. This paper introduces the notion of local scaling in density based clustering, which determines the density threshold based on the local statistics of the data. The local maxima of density are discovered using a k-nearest-neighbor density estimation and used as centers of potential clusters. Each cluster is grown until the density falls below a pre-specified ratio of the center point’s density. The resulting clustering technique is able to identify clusters of arbitrary shape on noisy backgrounds that contain significant density gradients. The focus of this paper is to automate the process of clustering by making use of the local density information for arbitrarily sized, shaped, located, and numbered clusters. The performance of the new algorithm is promising as it is demonstrated on a number of synthetic datasets and images for a wide range of its parameters.
2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009
Clustering is the problem of finding relations in a data set in an supervised manner. These relations can be extracted using the density of a data set, where density of a data point is defined as the number of data points around it. To find the number of data points around another point, region queries are adopted. Region queries are the most expensive construct in density based algorithm, so it should be optimized to enhance the performance of density based clustering algorithms specially on large data sets. Finding the optimum set of region queries to cover all the data points has been proven to be NP-complete. This optimum set is called the skeletal points of a data set. In this paper, we proposed a generic algorithms which fires region queries at most 6 times the optimum number of region queries (has 6 as approximation factor). Also, we have extend this generic algorithm to create a DBSCAN (the most wellknown density based algorithm) derivative, named ADBSCAN. Presented experimental results show that ADBSCAN has a better approximation to DBCSAN than the DBRS (the most well-known randomized density based algorithm) in terms of performance and quality of clustering, specially for large data sets.
International Journal of Machine Learning and Computing, 2013
Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. The traditional algorithms do not meet the latest multiple requirements simultaneously for objects. Density-based clustering algorithms find clusters based on density of data points in a region. DBSCAN algorithm is one of the density-based clustering algorithms. It can discover clusters with arbitrary shapes and only requires two input parameters.In this paper, we propose a new algorithm based on DBSCAN. We design a new method for automatic parameters generation that create clusters with different densities and generates arbitrary shaped clusters. The kd-tree is used for increasing the memory efficiency. The performance of proposed algorithm is compared with DBSCAN. Experimental results indicate the superiority of proposed algorithm.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (2)
- M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering struc- ture. In ACM Sigmod Record, volume 28, pages 49-60. ACM, 1999.
- M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density- based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226- 231, 1996.