This study presents a new approach for partitioning data sets affected by outliers. The proposed... more This study presents a new approach for partitioning data sets affected by outliers. The proposed scheme consists of two main stages. The first stage is a preprocessing technique that aims to detect data value to be outliers by introducing the notion of object's proximity degree. The second stage is a new procedure based on the Fuzzy C-Means (FCM) algorithm and the concept of outliers clusters. It consists to introduce clusters for outliers in addition to regular clusters. The proposed algorithm initializes their centers by the detected possible outliers. Final and accurate decision is made about these possible outliers during the process. The performance of this approach is also illustrated through real and artificial examples.
This paper introduces a new dynamic method for unsupervised learning, aimed at discovering and re... more This paper introduces a new dynamic method for unsupervised learning, aimed at discovering and representing structures of homogeneous clusters within unlabeled training data; where the number of clusters is algorithmically estimated with no assumption about the compactness and the separation of clusters. Assuming that the training data are originated from at least two different clusters, and that a minimum average degree of similarity exists between objects of each cluster, the learning process is initiated by creating two clusters around the least similar objects according to a given measure of inter-points similarities. The remaining objects are sequentially explored by analyzing their similarities with the mean points or centers that represent previously discovered clusters. For each of these objects a new learning rule is used for (1) creating a new cluster around this object, or (2) using the information carried by the object for updating representative points of existing clusters, or (3) defer consideration of this object until either of the two previous decisions can be made with enough confidence. The method is dynamic in that the decision rule depends upon the number of clusters, which varies during the learning process. The effectiveness of this method is assessed on six real benchmark datasets in comparison to four other methods that require the number of clusters as an input, namely K-means, Iterative Self-Organizing Data Analysis Technique (ISODATA), Fuzzy c-means (FCM), Possibilistic c-means (PCM), and an unsupervised fuzzy learning method (UFL) that tries to automatically determine the number of clusters, and whose the proposed method constitutes an improved version (IUFL).
The fuzzy c-means algorithm (FCM) is a widely used for fuzzy clustering. Usually, FCM uses the Eu... more The fuzzy c-means algorithm (FCM) is a widely used for fuzzy clustering. Usually, FCM uses the Euclidean distance as similarity measure among data points. However, this distance is strongly influenced by the larger units of measure and promotes the circular forms of data. A wide variety of distance measures have been suggested to detect different forms of cluster in data sets. A typical example of these distances is the Lp distance. In this paper, we show that values of the parameter p less than 1 can improve significantly the performance of FCM, especially when the data set contains outliers. This measure is called fractional metric. For this, we realise a comparative study of FCM with different values of p on six data sets. The results show clearly that fractional metric allows FCM to produce good results in a wide variety of real world applications.
This paper presents a new approach for detecting outliers by
introducing the notion of object’s... more This paper presents a new approach for detecting outliers by
introducing the notion of object’s proximity. The main idea is
that normal point has similar characteristics with several neighbors. So the point in not an outlier if it has a high degree of proximity and its neighbors are several. The performance of this approach is illustrated through real datasets.
The distance measure is an important criterion in any clustering algorithm. This paper shows how ... more The distance measure is an important criterion in any clustering algorithm. This paper shows how fuzzy clustering results can be improved by introducing a weighting factor in the inter-objects distance measures. New weighted versions of four well-known distance measures are considered. These distances are tested, using the fuzzy c-means algorithm, on three datasets. Experimental results show that the introduced weighting factor leads to a significant improvement in comparison with the standard unweighted distances.
Tournament selection has been widely used and studied in evolutionary algorithms. The size of tou... more Tournament selection has been widely used and studied in evolutionary algorithms. The size of tournament is a crucial parameter for this method. It influences on the algorithm convergence, the population diversity and the solution quality. This paper presents a new technique to adjust this parameter dynamically using fuzzy unsupervised learning. The efficiency of the proposed technique is shown by using several benchmark multimodal test functions.
Uploads
Papers by Amina Dik
introducing the notion of object’s proximity. The main idea is
that normal point has similar characteristics with several neighbors. So the point in not an outlier if it has a high degree of proximity and its neighbors are several. The performance of this approach is illustrated through real datasets.