Key research themes
1. How can adaptive selection of the number of neighbors (k) improve kNN classification performance?
This research area addresses the problem that the optimal number of neighbors (k) for kNN classification can vary across different test instances due to local data distribution heterogeneity. Fixed k can cause either underfitting or overfitting, affecting classification accuracy and computational efficiency. Adaptive k selection aims to assign a tailored k to each test point based on data-driven criteria, improving predictive performance and computational cost.
2. What is the impact of distance metric choice on kNN classification accuracy and robustness?
Since kNN classification critically depends on measuring similarity or distance in feature space, choosing an appropriate distance metric is crucial. Different metrics handle varying data characteristics, noise levels, and feature types differently, greatly influencing performance. This research theme explores comprehensive evaluation of metrics, their effects on accuracy, robustness to noise, and appropriateness for different data distributions.
3. How can prototype selection and data reduction techniques improve the efficiency and accuracy of kNN classification on large datasets?
kNN suffers from high memory use and slow query times since it requires the entire training set during classification. Prototype Selection (PS) and Prototype Generation (PG) methods aim to reduce the training set size to improve efficiency while maintaining or enhancing accuracy. Research on these approaches investigates how best to select or generate representative samples, handle noise, and maintain decision boundary fidelity, especially in cases with structured or high-dimensional data.