Academia.eduAcademia.edu

K means algorithm

description2,356 papers
group4,301 followers
lightbulbAbout this topic
The K-means algorithm is a popular unsupervised machine learning technique used for clustering data into K distinct groups based on feature similarity. It iteratively assigns data points to the nearest cluster centroid and updates the centroids until convergence, minimizing the within-cluster variance.
lightbulbAbout this topic
The K-means algorithm is a popular unsupervised machine learning technique used for clustering data into K distinct groups based on feature similarity. It iteratively assigns data points to the nearest cluster centroid and updates the centroids until convergence, minimizing the within-cluster variance.
Due to the dramatic increase of data volumes in different applications, it is becoming infeasible to keep these data in one centralized machine. It is becoming more and more natural to deal with distributed databases and networks. That is... more
In this paper, we propose a cluster-based cumulative representation for cluster ensembles. Cluster labels are mapped to incrementally accumulated clusters, and a matching criterion based on maximum similarity is used. The ensemble method... more
In this paper, a novel K-means clustering algorithm is proposed. Before running the traditional Kmeans, the cluster centers should be randomly selected, which would influence the time cost and accuracy. To solve this problem, we utilize... more
The importance of energy conservation presents a considerable challenge in wireless sensor networks (WSNs), where the sensor nodes (SNs) that constitute the network depend on battery power. Recharging the batteries of SNs in the field is... more
The detection of overlapping patterns in unlabeled data sets referred as overlapping clustering is an important issue in data mining. In real life applications, overlapping clustering algorithm should be able to detect clusters with... more
CiteSeerX - Document Details (Isaac Councill, Lee Giles): Abstract. Thanks to an important research effort the last few years, inductive queries on local patterns (eg, set patterns) and complete solvers which can evaluate them on large... more
Clustering documents into classes is an important task in many Information Retrieval (IR) systems. This achieved grouping enables a description of the contents of the document collection in terms of the classes the documents fall into.... more
One of the fundamental clustering problems is to assign n points into k clusters based on the minimal sum-of-squares(MSSC), which is known to be NP-hard. In this paper, by using matrix arguments, we first model MSSC as a so-called 0-1... more
Growing self-organizing map (GSOM) has been introduced as an improvement to the self-organizing map (SOM) algorithm in clustering and knowledge discovery. Unlike the traditional SOM, GSOM has a dynamic structure which allows nodes to grow... more
In this paper we propose the application of the generalized median graph in a graph-based k -means clustering algorithm. In the graph-based k -means algorithm, the centers of the clusters have been traditionally represented using the set... more
Clustering is an important research topic in wireless networks, because cluster structures can facilitate resource reuse and increase system capacity. Ad hoc networks consist of wireless hosts that communicate with each other in the... more
With economic globalization and continuous development of e-commerce, customer relationship management (CRM) has become an important factor in growth of a company. CRM requires huge expenses. One way to profit from investment and drive... more
Effective decisions are mandatory for any company to generate good revenue. In these days competition is huge and all companies are moving forward with their own different strategies. We should use data and take a proper decision. Every... more
Efficiency in the labour market is usually accounted for in order to understand and assess earning gaps that prevail among males and females. Arguing that the individuals ’ skills, productivity, and commitment to work ultimately determine... more
This article presents a citation-based mapping exercise in the nanosciences field and a first sketch of citation transactions (a measure of cognitive dependences). nanosciences are considered to be one of the "convergent" components... more
In this paper we present a new approach to deal with image segmentation. The fact that a single segmentation result do not generally allow a higher level process to take into account all the elements included in the image has motivated... more
The exponential growth of scientific data necessitates modern techniques to manage and extract valuable insights from massive data collections. Traditional database querying methods are often inadequate for handling the scale of big data.... more
Emotion detection is a new research era in health informatics and forensic technology. Besides having some challenges, voice based emotion recognition is getting popular, as the situation where the facial image is not available, the voice... more
Efforts to retain customers represent a crucial customer relationship management (CRM) strategy in every business, offering the potential to enhance profits, particularly for small and medium enterprises (SMEs). In the context of this... more
Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community... more
Clustering is one of the most important data mining techniques that partitions data according to some similarity criterion. The problems of clustering categorical data have attracted much attention from the data mining research community... more
In the cloud computing environment, load balancing plays an important role in the efficient operation of cloud computing, where a multitude of resources serve diverse workloads and fluctuating demands. In the rapidly evolving cloud... more
Predicting student performance is essential for enhancing educational outcomes, enabling educators to identify students who may need additional support or intervention. Clustering algorithms, as unsupervised data mining techniques, are... more
Recently, two extensions of neural gas have been proposed: a fast batch version of neural gas for data given in advance, and extensions of neural gas to learn a (possibly fuzzy) supervised classification. Here we propose a batch version... more
En el presente trabajo se determinó los parámetros a y b de la ecuación de Angstrom -Page asociadas a tres zonas bioclimáticas distintas (selva tropical, sabana tropical y templado húmedo), cercanas tanto entre ellas como a la línea... more
In many physical statistical, biological and other investigations it is desirable to approximate a system of points by objects of lower dimension and/or complexity. For this purpose, Karl Pearson invented principal component analysis in... more
This study examined the level of community satisfaction with the operational performance of the 1st Provincial Mobile Force Company (PMFC) in Basilan concerning its anti-criminality campaign. Using a descriptive research design, data were... more
In an era that is increasingly competitive and rapidly developing in the beauty industry, understanding customer behavior has a very important role. Aesthetic clinics or beauty clinics, have become the main destination for individuals... more
One of the most popular clustering techniques is the k-means clustering algorithm. However, the utilization of the k-means is severely limited by its high computational complexity. In this study, we propose a new strategy to accelerate... more
Automatic recognition of abnormal patterns in control charts has seen increasing demands nowadays in manufacturing processes. This paper presents a novel hybrid intelligent method (HIM) for recognition of the common types of control chart... more
Nations power their economic growth capitalizing on their integrated workforce whereby male and female workforce join the productivity ranks to maximize financial returns, boost the productive sectors, and move the country to higher ranks... more
The idea of evidence accumulation for the combination of multiple clusterings was recently proposed . Taking the K-means as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern... more
The idea of evidence accumulation for the combination of multiple clusterings was recently proposed . Taking the K-means as the basic algorithm for the decomposition of data into a large number, k, of compact clusters, evidence on pattern... more
Due to the advancements in digital technologies and social networking, image collections are growing exponentially. The important aim in content-based image retrieval (CBIR) is to reduce the semantic gap issue that improves the... more
An important consideration in clustering is the determination of the correct number of clusters and the appropriate partitioning of a given data set. In this paper, a newly developed point symmetry distance is used to propose a new... more
A comprehensive understanding of electrical energy consumption patterns is essential for strategizing and monitoring the use of energy resources. Industry and business customers of electrical have energy consumption patterns that vary... more
Clustering aims at partitioning unlabelled data samples into clusters so that the samples within a cluster are close to each other. One of the most challenging issues in cluster analysis is the determination of true clusters. This... more
Due to the dramatic increase of data volumes in different applications, it is becoming infeasible to keep these data in one centralized machine. It is becoming more and more natural to deal with distributed databases and networks. That is... more
This paper discusses the use of an integrated HMM/NN classifier for speech recognition. The proposed classifier combines the time normalization property of the HMM classifier with the superior discriminative ability of the neural net (NN)... more
We generalize the k-means algorithm presented by the authors and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness).... more
Telecommunication industry plays a vital role in the modern fast-moving world. At the same time, the industry is highly competitive because of multiple providers provide different solutions to their consumers. As a result, customers are... more
In the context of unsupervised clustering, a new algorithm for the domain of graphs is introduced. In this paper, the key idea is to adapt the mean-shift clustering and its variants proposed for the domain of feature vectors to graph... more
Clustering is a division of data into groups of similar objects. Kmeans has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version... more
CLUSTERING is one of the fundamental data mining tasks. Many different clustering paradigms have been developed over the years, which include partitional, hierarchical, mixture model based, density-based, spectral, subspace, and so on.... more
Recently, two extensions of neural gas have been proposed: a fast batch version of neural gas for data given in advance, and extensions of neural gas to learn a (possibly fuzzy) supervised classification. Here we propose a batch version... more
Statistical software is commonly used in the statistical lessons at universities. The developments and enhancement in statistical software in recent years has considerably eased statistics education in these institutions. The purpose of... more
The primary care clinics of the U.S. Department of Veterans Affairs (VA) have long suffered the adverse effect of missed opportunity risks which are mainly reflected by patient no-shows. This calls for quantitative tools that can identify... more
Agriculture in Nigeria is a branch of its economy providing employment for over 70% of its population and contributing about 41% to it gross domestic production (GDP). Nigeria's wide range of climate variations allows it to produce a... more
In this paper we propose a new type of distance-based classifier. Traditionally, these classifiers are instancebased: they classify a test instance by computation of a similarity measure between that instance and the instances in the... more
Olimpiade Sains Nasional (OSN) merupakan ajang kompetisi ilmiah yang diikuti oleh pelajar dari seluruh Indonesia. Setiap tahun, peserta dari berbagai provinsi berlomba untuk meraih medali, namun tidak semua provinsi memiliki perolehan... more
Download research papers for free!