Academia.eduAcademia.edu

KNN Classification

description21 papers
group0 followers
lightbulbAbout this topic
KNN Classification, or k-Nearest Neighbors Classification, is a supervised machine learning algorithm used for classification tasks. It classifies data points based on the majority class among their k nearest neighbors in the feature space, utilizing distance metrics to determine proximity.
lightbulbAbout this topic
KNN Classification, or k-Nearest Neighbors Classification, is a supervised machine learning algorithm used for classification tasks. It classifies data points based on the majority class among their k nearest neighbors in the feature space, utilizing distance metrics to determine proximity.

Key research themes

1. How can adaptive selection of the number of neighbors (k) improve kNN classification performance?

This research area addresses the problem that the optimal number of neighbors (k) for kNN classification can vary across different test instances due to local data distribution heterogeneity. Fixed k can cause either underfitting or overfitting, affecting classification accuracy and computational efficiency. Adaptive k selection aims to assign a tailored k to each test point based on data-driven criteria, improving predictive performance and computational cost.

Key finding: Introduces Correlation Matrix kNN (CM-kNN) which learns different k values for each test data point by reconstructing test points from training data with sparsity and local structure preservation constraints. The model uses... Read more
Key finding: Proposes an adaptive kNN framework where the number of neighbors is dynamically selected per test instance using early-break heuristics to balance classification accuracy and computational cost. The method reduces search... Read more
Key finding: Derives a probabilistic formula to estimate reliability or confidence of the kNN classification decision for individual test points in the two-class case, accounting for unequal class sizes and neighborhood label composition.... Read more
Key finding: Develops a kNN-based model that constructs a representative set from training data, automatically determining the optimal k for classification in different regions of the feature space. The approach reduces dependence on a... Read more

2. What is the impact of distance metric choice on kNN classification accuracy and robustness?

Since kNN classification critically depends on measuring similarity or distance in feature space, choosing an appropriate distance metric is crucial. Different metrics handle varying data characteristics, noise levels, and feature types differently, greatly influencing performance. This research theme explores comprehensive evaluation of metrics, their effects on accuracy, robustness to noise, and appropriateness for different data distributions.

Key finding: Systematically compares numerous distance and similarity measures (e.g., Euclidean, Manhattan, Mahalanobis, Chi square) on multiple real-world datasets with varying noise levels. Shows that classifier accuracy varies... Read more
Key finding: Compares Euclidean and Manhattan distance metrics for kNN classification predicting student graduation timeliness using academic and demographic features. Results suggest Manhattan distance achieves comparable or slightly... Read more
Key finding: Provides foundational exposition on the role of distance metric (Euclidean default) in kNN classification, illustrating how different distance computations (e.g., Manhattan) influence neighborhood formation. Emphasizes that... Read more

3. How can prototype selection and data reduction techniques improve the efficiency and accuracy of kNN classification on large datasets?

kNN suffers from high memory use and slow query times since it requires the entire training set during classification. Prototype Selection (PS) and Prototype Generation (PG) methods aim to reduce the training set size to improve efficiency while maintaining or enhancing accuracy. Research on these approaches investigates how best to select or generate representative samples, handle noise, and maintain decision boundary fidelity, especially in cases with structured or high-dimensional data.

Key finding: Proposes a two-step scheme combining Prototype Selection to reduce the training set with a class proposal ranking to filter prototypes, thus improving multi-label kNN classification performance. This approach filters out... Read more
Key finding: Investigates the use of dissimilarity space methods to transform structural data into a feature representation, enabling Prototype Generation approaches otherwise difficult on structural data like strings or graphs.... Read more
Key finding: Introduces a model-based data reduction method that constructs a set of representative prototypes through hyperrelation theory to reduce training data size. This reduces the memory and computational costs of kNN... Read more

All papers in KNN Classification

This paper has the objective of developing a methodology to store a great mass of semi-structured data and its later recovery, focusing on possible improvements and applications by text mining, uniting related articles from many... more
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from... more
Data Reduction techniques play a key role in instance-based classification to lower the amount of data to be processed. Among the different existing approaches, Prototype Selection (PS) and Prototype Generation (PG) are the most... more
Recent years have witnessed an astronomical growth in the amount of textual information available both on the web and institutional wise document repositories. As a result, text mining has become extremely prevalent and processing of... more
Recent years have witnessed an astronomical growth in the amount of textual information available both on the web and institutional wise document repositories. As a result, text mining has become extremely prevalent and processing of... more
Accurate wound assessment is a critical task for patient care and health cost reduction at hospital but even still worse in the context of clinical studies in laboratory. This task, completely devoted to nurses, still relies on manual and... more
The instrument COSIMA (COmetary Secondary Ion Mass Analyzer) onboard of the European Space Agency mission Rosetta collected and analyzed dust particles in the neighborhood of comet 67P/Churyumov-Gerasimenko. The chemical composition of... more
Repeated double cross validation (rdCV) has recently been suggested as a careful and conservative strategy for optimizing and evaluating empirical multivariate calibration models. This evaluation strategy is adapted in this work for... more
Classification based on k-nearest neighbors (kNN classification) is one of the most widely used classification methods. The number k of nearest neighbors used for achieving a high accuracy in classification is given in advance and is... more
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from... more
Classification based on k-nearest neighbors (kNN classification) is one of the most widely used classification methods. The number k of nearest neighbors used for achieving a high accuracy in classification is given in advance and is... more
Abstract. Classification based on k-nearest neighbors (kNN classification) is one of the most widely used classification methods. The number k of nearest neighbors used for achieving a high precision in classification is given in advance... more
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from... more
This paper reviews the English language isolated words databases which are collected from native and non -native speakers of English language throughout the world. It also reports the various methods that are used in English speech... more
Autism Spectrum Disorder (ASD) is a multifaceted neurodevelopmental condition. Atypical communication mostly occurs in tandem with ASD. We compared voice pitch of 16 Marathi children and adolescents with ASD of age of 7 to 18 with 27... more
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from... more
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from... more
by Mi Lu
Due to the high demands for computing, the available resources always lack. The approximate computing technique is the key to lowering hardware complexity and improving energy efficiency and performance. However, it is a challenge to... more
This paper gives a comparison of two extracted features namely pitch and formants for emotion recognition from speech. The research shows that various features namely prosodic and spectral have been used for emotion recognition from... more
Download research papers for free!