ADCN: An Anisotropic Density-Based Clustering Algorithm

Yingjie Hu

doi:10.1145/2996913.2996940

Outline

ADCN: An Anisotropic Density-Based Clustering Algorithm

Yingjie Hu

https://doi.org/10.1145/2996913.2996940

visibility

…

description

4 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

In this work we introduce an anisotropic density-based clustering algorithm. It outperforms DBSCAN and OPTICS for the detection of anisotropic spatial point patterns and performs equally well in cases that do not explicitly benefit from an anisotropic perspective. ADCN has the same time complexity as DBSCAN and OPTICS, namely O(n log n) when using a spatial index, O(n 2 ) otherwise.

Figures (4)

Algorithm 2: ellipseRegionQuery(p;, D, MinPts, Eps)

We generated 21 test cases with 3 different noise settings for each of them. Out of these, we will discuss 6 synthetic and 4 real-world use cases here which results in a total of 30 study cases. In order to simulate a ”ground truth” for the synthetic cases, we created polygons to indicate different clusters and randomly generated points within these poly- gons and outside of them. We took a similar approach for the four real-world cases. The only difference is that the poly- gons for real world cases have been generated from buffer zones with a 3m radius of the real-world features. To avoid cases in which it is unreasonable to expect algorithms and humans to differentiate between noise and pattern, we in- troduced a clipping buffer of 0m, 5m, and 10m. All of these four algorithms take the same parameters (Eps, MinPts). As there are no established methods to determine the best overall parameter combination! with respect to NMI and Rand Index, we stepwise tested parameter combinations.

Deyi Xiong

2006

Density-based clustering algorithms are attractive for the task of class identification in spatial database. However, in many cases, very different local-density clusters exist in different regions of data space, therefore, DBSCAN [Ester, M. et al., A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In E. Simoudis, J. Han, & U. M. Fayyad (Eds.), Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining (pp. 226-231). Portland, OR: AAAI.] using a global density parameter is not suitable. As an improvement, OPTICS [Ankerst, M. et al,(1999). OPTICS: Ordering Points To Identify the Clustering Structure. In A. Delis, C. Faloutsos, & S. Ghandeharizadeh (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 49-60). Philadelphia, PA: ACM.] creates an augmented ordering of the database representing its density-based clustering structure, but it only generates the clusters whose local-density exceeds some threshold instead of similar local-density clusters and doesn't produce a clustering of a data set explicitly. Furthermore the parameters required by almost all the well-known clustering algorithms are hard to determine but have a significant influence on the clustering result. In this paper, a new clustering algorithm LDBSCAN relying on a local-density-based notion of clusters is proposed to solve those problems and, what is more, it is very easy for us to pick the appropriate parameters and takes the advantage of the LOF [Breunig, M. M., et al.,(2000). LOF: Identifying Density-Based Local Outliers. In W. Chen, J. F. Naughton, & P. A. Bernstein (Eds.), Proc. ACM SIGMOD Int. Conf. on Management of Data (pp. 93-104). Dalles, TX: ACM.] to detect the noises comparing with other density-based clustering algorithms. The proposed algorithm has potential applications in business intelligence and enterprise information systems.

downloadDownload free PDF View PDFchevron_right

Efficient Density-Based Clustering

Kanna Velusamy

This thesis is concerned with efficient density-based clustering using algorithms such as DBSCAN and NBC as well as the application of indices and the property of triangle inequality in order to make these algorithms faster.

downloadDownload free PDF View PDFchevron_right

An efficient and scalable density-based clustering algorithm for datasets with complex structures

마트 롯데

As a research branch of data mining, clustering, as an unsupervised learning scheme, focuses on assigning objects in the dataset into several groups, called clusters, without any prior knowledge. Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most widely used clustering algorithms for spatial datasets, which can detect any shapes of clusters and can automatically identify noise points. However, there are several troublesome limitations of DBSCAN: (1) the performance of the algorithm depends on two specified parameters, ε and MinPts in which ε represents the maximum radius of a neighborhood from the observing point and MinPts means the minimum number of data points contained in such a neighborhood. (2) The time consumption for searching the nearest neighbors of each object is intolerable in the cluster expansion. (3) Selecting different starting points results in quite different consequences. (4) DBSCAN is unable to identify adjacent clusters of various densities. In addition to these restrictions about DBSCAN mentioned above, the identification of border points is often ignored. In our paper, we successfully solve the above problems. Firstly, we improve the traditional locality sensitive hashing method to implement fast query of nearest neighbors. Secondly, several definitions are redefined on the basis of the influence space of each object, which takes the nearest neighbors and the reverse nearest neighbors into account. The influence space is proved to be sensitive to local density changes to successfully reduce the amount of parameters and identify adjacent clusters of different densities. Moreover, this new relationship based on the influence space makes the insensitivity to the ordering of inputting points possible. Finally, a new concept—core density reachable based on the influence space is put forward which aims to distinguish between border objects and noisy objects. Several experiments are performed which demonstrate that the performance of our proposed algorithm is better than the traditional DBSCAN algorithm and the improved algorithm IS-DBSCAN.

downloadDownload free PDF View PDFchevron_right

An Optimised Density Based Clustering Algorithm

Hencil Peter

International Journal of Computer Applications, 2010

The DBSCAN [1] algorithm is a popular algorithm in Data Mining field as it has the ability to mine the noiseless arbitrary shape Clusters in an elegant way. As the original DBSCAN algorithm uses the distance measures to compute the distance between objects, it consumes so much processing time and its computation complexity comes as O (N 2). In this paper we have proposed a new algorithm to improve the performance of DBSCAN algorithm. The existing algorithms A Fast DBSCAN Algorithm[6] and Memory effect in DBSCAN algorithm[7] has been combined in the new solution to speed up the performance as well as improve the quality of the output. As the RegionQuery operation takes long time to process the objects, only few objects are considered for the expansion and the remaining missed border objects are handled differently during the cluster expansion. Eventually the performance analysis and the cluster output show that the proposed solution is better to the existing algorithms.

downloadDownload free PDF View PDFchevron_right

CRYSTAL - A new density-based fast and efficient clustering algorithm

Marina Gavrilova

2006 3rd International Symposium on Voronoi Diagrams in Science and Engineering, 2006

In this paper, we present a fast O(nlogn) clustering algorithm based on Delaunay Triangulation for identifying clusters of different shapes, not necessarily convex. The clustering result is similar to human perception of clusters. The novelty of our method is the growth model we follow in the cluster formation that resembles the natural growth of a crystal. Our algorithm is able to identify dense as well as sparse clusters and also clusters connected by bridges. We demonstrate clustering results on several synthetic datasets and provide a comparison with popular K-Means based clustering methods. The clustering is based purely on proximity analysis in the Delaunay Triangulation and avoids usage of global parameters. It is robust in the presence of noise. Finally, we demonstrate the capability of our clustering algorithm in handling very large datasets.

downloadDownload free PDF View PDFchevron_right

NDCMD: A Novel Approach Towards Density Based Clustering Using Multidimensional Spatial Data

KHUSHALI MISTRY

2013

Density based clustering algorithm is one of the primary methods for clustering in data mining. The clusters which are formed based on the density are easy to understand and it does not limit itself to the shapes of clusters. One of them is DBSCAN which is a well known DENSITY-based clustering algorithm used for mining of unsupervised data. The DBSCAN algorithm suffers from several deficiencies whenever the database size is large. Also, DBSCAN does not respond well to data sets with varying densities. For this reason its complexity in worst case becomes O(n 2). The PROPOSED novel algorithm NDCMD (A Unified Novel Density Based Clustering Using Multidimensional Spatial Data): this outperforms DBSCAN for varying density. This is motivated by the current state-of-the-art density clustering algorithm DBSCAN. Ultimately we show how to automatically and capably extract not only 'traditional' clustering information, such as representative points, but also the fundamental clustering structure. Extensive experiments on some synthetic datasets show the validity of the proposed algorithm.

downloadDownload free PDF View PDFchevron_right

A density invariant approach to clustering

PROF MAHUA BHATTACHARYA

Neural Computing and Applications, 2016

Organizing data into sensible groups is called as 'data clustering.' It is an open research problem in various scientific fields. Neither a universal solution nor an absolute strategy for its evaluation exists in the literature. In this context, through this paper, we make following three contributions: (1) A new method for finding 'natural groupings' or clusters in the data set is presented. For this, a new term 'vicinity' is coined. Vicinity captures the idea of density together with spatial distribution of data points in feature space. This new notion has a potential to separate various type of clusters. In summary, the approach presented here is non-convex admissive (i.e., convex hulls of the clusters found can intersect which is desirable for non-convex clusters), cluster proportion and omission admissive (i.e., duplicating a cluster arbitrary number of times or deleting a cluster does not alter other cluster's boundaries), scale covariant, consistent (shrinking within cluster distances and enlarging inter-cluster distances does not affect the clustering results) but not rich (does not generates exhaustive partitions of the data) and density invariant. (2) Strategy for automatic detection of various tunable parameters in the proposed 'Vicinity Based Cluster Detection' (VBCD) algorithm is presented. (3) New internal evaluation index called 'Space-Density Index' (SDI) for the clustered results (by any method) is also presented. Experimental results reveal that VBCD captures the idea of 'natural groupings' better than the existing approaches. Also, SDI evaluation scheme provides a better judgment as compared to earlier internal cluster validity indices.

downloadDownload free PDF View PDFchevron_right

Locally Scaled Density Based Clustering

Deniz Yuret

2007

Density based clustering methods allow the identification of arbitrary, not necessarily convex regions of data points that are densely populated. The number of clusters does not need to be specified beforehand; a cluster is defined to be a connected region that exceeds a given density threshold. This paper introduces the notion of local scaling in density based clustering, which determines the density threshold based on the local statistics of the data. The local maxima of density are discovered using a k-nearest-neighbor density estimation and used as centers of potential clusters. Each cluster is grown until the density falls below a pre-specified ratio of the center point’s density. The resulting clustering technique is able to identify clusters of arbitrary shape on noisy backgrounds that contain significant density gradients. The focus of this paper is to automate the process of clustering by making use of the local density information for arbitrarily sized, shaped, located, and numbered clusters. The performance of the new algorithm is promising as it is demonstrated on a number of synthetic datasets and images for a wide range of its parameters.

downloadDownload free PDF View PDFchevron_right

An approximation algorithm for finding skeletal points for density based clustering approaches

mahdi tehrani

2009 IEEE Symposium on Computational Intelligence and Data Mining, 2009

Clustering is the problem of finding relations in a data set in an supervised manner. These relations can be extracted using the density of a data set, where density of a data point is defined as the number of data points around it. To find the number of data points around another point, region queries are adopted. Region queries are the most expensive construct in density based algorithm, so it should be optimized to enhance the performance of density based clustering algorithms specially on large data sets. Finding the optimum set of region queries to cover all the data points has been proven to be NP-complete. This optimum set is called the skeletal points of a data set. In this paper, we proposed a generic algorithms which fires region queries at most 6 times the optimum number of region queries (has 6 as approximation factor). Also, we have extend this generic algorithm to create a DBSCAN (the most wellknown density based algorithm) derivative, named ADBSCAN. Presented experimental results show that ADBSCAN has a better approximation to DBCSAN than the DBRS (the most well-known randomized density based algorithm) in terms of performance and quality of clustering, specially for large data sets.

downloadDownload free PDF View PDFchevron_right

Fast Density Based Clustering Algorithm

Priyanka Trikha

International Journal of Machine Learning and Computing, 2013

Clustering problem is an unsupervised learning problem. It is a procedure that partition data objects into matching clusters. The data objects in the same cluster are quite similar to each other and dissimilar in the other clusters. The traditional algorithms do not meet the latest multiple requirements simultaneously for objects. Density-based clustering algorithms find clusters based on density of data points in a region. DBSCAN algorithm is one of the density-based clustering algorithms. It can discover clusters with arbitrary shapes and only requires two input parameters.In this paper, we propose a new algorithm based on DBSCAN. We design a new method for automatic parameters generation that create clusters with different densities and generates arbitrary shaped clusters. The kd-tree is used for increasing the memory efficiency. The performance of proposed algorithm is compared with DBSCAN. Experimental results indicate the superiority of proposed algorithm.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (2)

M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander. OPTICS: ordering points to identify the clustering struc- ture. In ACM Sigmod Record, volume 28, pages 49-60. ACM, 1999.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. A density- based algorithm for discovering clusters in large spatial databases with noise. In KDD, volume 96, pages 226- 231, 1996.

Izabela A Wowczko

This paper presents two density-based algorithms: Density Based Spatial Clustering of Applications with Noise (DBSCAN) and Ordering Points to Identify the Clustering Structure (OPTICS). The notion of density, as well as its various estimators, is explained. We compare two methods of identifying similar objects based on their density, of which one produces clusters and the other outputs augmented ordering representing density-based structure of a database. The parameters and their optimisations are also discussed.

downloadDownload free PDF View PDFchevron_right

A Survey on Density-Based Clustering Algorithms

He HUIHAO

Density-based clustering forms the clusters of densely gathered objects separated by sparse regions. In this paper, we survey the previous and recent density-based clustering algorithms. DBSCAN [6], OPTICS [1], and DENCLUE [5, 6] are previous representative density-based clustering algorithms. Several recent algorithms such as PDBSCAN [8], CUDA-DClust [3], and GSCAN [7] have been proposed to improve the performance of DBSCAN. They make the most of multi-core CPUs and GPUs.

downloadDownload free PDF View PDFchevron_right

Density-Based Clustering in Spatial Databases: The Algorithm GDBSCAN and Its Applications

Simone Matheus

Data Mining and Knowledge Discovery, 1998

The clustering algorithm DBSCAN relies on a density-based notion of clusters and is designed to discover clusters of arbitrary shape as well as to distinguish noise. In this paper, we generalize this algorithm in two important directions. The generalized algorithm-called GDBSCAN-can cluster point objects as well as spatially extended objects according to both, their spatial and their nonspatial attributes. In addition, four applications using 2D points (astronomy), 3D points (biology), 5D points (earth science) and 2D polygons (geography) are presented, demonstrating the applicability of GDBSCAN to real-world problems.

downloadDownload free PDF View PDFchevron_right

K-DBSCAN: Identifying Spatial Clusters with Differing Density Levels

Praveen Tripathi

2015 International Workshop on Data Mining with Industrial Applications (DMIA), 2015

Spatial clustering is a very important tool in the analysis of spatial data. In this paper, we propose a novel density based spatial clustering algorithm called K-DBSCAN with the main focus of identifying clusters of points with similar spatial density. This contrasts with many other approaches, whose main focus is spatial contiguity. The strength of K-DBSCAN lies in finding arbitrary shaped clusters in variable density regions. Moreover, it can also discover clusters with overlapping spatial regions, but differing density levels. The goal is to differentiate the most dense regions from lower density regions, with spatial contiguity as the secondary goal. The original DBSCAN fails to discover the clusters with variable density and overlapping regions. OPTICS and Shared Nearest Neighbour (SNN) algorithms have the capabilities of clustering variable density datasets but they have their own limitations. Both fail to detect overlapping clusters. Also, while handling varying density, both of the algorithms merge points from different density levels. K-DBSCAN has two phases: first, it divides all data objects into different density levels to identify the different natural densities present in the dataset; then it extracts the clusters using a modified version of DBSCAN. Experimental results on both synthetic data and a real-world spatial dataset demonstrate the effectiveness of our clustering algorithm.

downloadDownload free PDF View PDFchevron_right

An advancement in clustering via nonparametric density estimation

Giovanna Menardi

Statistics and Computing, 2013

Density-based clustering methods hinge on the idea of associating groups to the connected components of the level sets of the density underlying the data, to be estimated by a nonparametric method. These methods claim some desirable properties and generally good performance, but they involve a non-trivial computational effort, required for the identification of the connected regions. In a previous work, the use of spatial tessellation such as the Delaunay triangulation has been proposed, because it suitably generalizes the univariate procedure for detecting the connected components. However, its computational complexity grows exponentially with the dimensionality of data, thus making the triangulation unfeasible for high dimensions. Our aim is to overcome the limitations of Delaunay triangulation. We discuss the use of an alternative procedure for identifying the connected regions associated to the level sets of the density. By measuring the extent of possible valleys of the density along the segment connecting pairs of observations, the proposed procedure shifts the formulation from a space with arbitrary dimension to a univariate one, thus leading benefits both in computation and visualization.

downloadDownload free PDF View PDFchevron_right

An enhanced density based spatial clustering of applications with noise

Ahmed Fahim

2009

Cluster analysis is a primary method for data mining. Finding clusters with varying sizes, shapes and densities is a challenging job. DBSCAN can find clusters with varying shapes and sizes. But it has a trouble in finding clusters with varying densities, because it depends on a global value for its parameter Eps. This paper presents enhanced DBSCAN which clusters databases containing clusters with varying densities effectively. The idea is to use varied values for Eps according to the local density of the starting point in each cluster. The clustering process starts from the highest local density point towards the lowest local density one. For each value of Eps, DBSCAN is adopted to make sure that all density reachable points with respect to current Eps are clustered. At the next process, the clustered points are ignored, to avoid merging among denser clusters with sparser ones.

downloadDownload free PDF View PDFchevron_right

ADBSCAN: Adaptive Density-Based Spatial Clustering of Applications with Noise for Identifying Clusters with Varying Densities

Md. Abu Bakr Siddique

2018 4th International Conference on Electrical Engineering and Information & Communication Technology (iCEEiCT), IEEE, 2018

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm which has the high-performance rate for dataset where clusters have the constant density of data points. One of the significant attributes of this algorithm is noise cancellation. However, DBSCAN demonstrates reduced performances for clusters with different densities. Therefore, in this paper, an adaptive DBSCAN is proposed which can work significantly well for identifying clusters with varying densities.

downloadDownload free PDF View PDFchevron_right

An Enhanced Multi Density Based Clustering Technique Using Density Level Partition (EDBSCAN-DLP)

Saif U R Rehman

Abstract Density based Spatial clustering of application with noise DBSCAN is a well-known clustering algorithm that can find clusters with arbitrary shape and handle noisy points effectively. However, DBSCAN is unable to find clusters with varying densities. DBSCAN requires user to input the parameter Eps and Minpts to execute the algorithm, which are hard to determine and directly influence the clustering result. DBSCAN-DLP improved DBSCAN by providing the mechanism of calculating suitable value of Eps automatically for each density level. DBSCAN-DLP also recognizes clusters of different densities. However, DBSCAN-DLP still requires user to input Minpts. In this research, we have proposed an enhanced E-DBSCAN-DLP algorithm by extending DBSCAN-DLP so that it can automatically determine the most suitable value of Minpts by using the statistical characteristics of dataset. Experimental results show that EDBSCAN-DLP estimates the value of Minpts accurately when providing different dat...

downloadDownload free PDF View PDFchevron_right

A Clustering Algorithm Incorporating Density and Direction

Gregory O'Hare

2008

Abstract This paper analyses the advantages and disadvantages of the K-means algorithm and the DENCLUE algorithm. In order to realise the automation of clustering analysis and eliminate human factors, both partitioning and density-based methods were adopted, resulting in a new algorithm-Clustering Algorithm based on object Density and Direction (CADD). This paper discusses the theory and algorithm design of the CADD algorithm.

downloadDownload free PDF View PDFchevron_right

Efficient Density Clustering Method for Spatial Data

William Perrizo, Dorothy ren, Baoying Wang

Lecture Notes in Computer Science, 2003

Data mining for spatial data has become increasingly important as more and more organizations are exposed to spatial data from sources such as remote sensing, geographical information systems, astronomy, computer cartography, environmental assessment and planning, etc. Recently, density based clustering methods, such as DENCLUE, DBSCAN, OPTICS, have been published and recognized as powerful clustering methods for data mining. These approaches have run time complexity of ) log ( n n O when using spatial index techniques, R + tree and grid cell. However, these methods are known to lack scalability with respect to dimensionality. In this paper, a unique approach to efficient neighborhood search and a new efficient density based clustering algorithm using EIN-rings are developed. Our approach exploits compressed vertical data structures, Peano Trees (P-trees 1 ), and fast P-tree logical operations to accelerate the calculation of the density function within EIN-rings. This approach stands in contrast to the ubiquitous approach of vertically scanning horizontal data structures (records). The average run time complexity of our algorithm for spatial data in d-dimension is ) ( n dn O . Our proposed method has comparable cardinality scalability with other density methods for small and medium size of data, but superior speed and dimensional scalability.

downloadDownload free PDF View PDFchevron_right

Cited by

DBSTexC

Minh Nguyen

Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017

Density-based spatial clustering of applications with noise (DBSCAN) is the most commonly used density-based clustering algorithm, where it can discover multiple clusters with arbitrary shapes. DBSCAN works properly when the input data type is homogeneous, but the DBSCAN's approach may not be sufficient when the input dataset has textual heterogeneity (e.g., when we intend to find clusters from geo-tagged posts on social media relevant to a certain point-of-interest (POI)), thus leading to poor performance. In this paper, we present DBSTexC, a new density-based clustering algorithm using spatio-textual information on Twitter. We first define POI-relevant and POIirrelevant tweets as the records that contain and do not contain a POI name or its coherent variations, respectively. By taking into account the fractions of POI-relevant and POI-irrelevant tweets, our DBSTexC algorithm shows a much higher clustering quality than the DBSCAN case in terms of the F 1 score and its variants. DBSTexC can be thought of as a generalized version of DBSCAN due to the findings that it performs identically as DBSCAN when the inputs are homogeneous and far outperforms DBSCAN when the heterogeneous input data type is given.

downloadDownload free PDF View PDFchevron_right

ADCN: An Anisotropic Density-Based Clustering Algorithm

Sign up for access to the world's latest research

Abstract

Related papers

References (2)

Related papers

Cited by