Learning k-Nearest Neighbor Naive Bayes for Ranking

Jiang, Liangxiao; Zhang, Harry; Su, Jiang

doi:10.1007/11527503_21

Outline

Learning k-Nearest Neighbor Naive Bayes for Ranking

Harry Zhang

2005

https://doi.org/10.1007/11527503_21

visibility

…

description

11 pages

link

1 file

Abstract

Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an effective classification model in decades. However, its performance in ranking is unknown. In this paper, we conduct a systematic study on the ranking performance of KNN. At first, we compare KNN and KNNDW (KNN with distance weighted) to decision trees and naive Bayes in ranking, measured by AUC (the area under the Receiver Operating Characteristics curve). Then, we propose to improve the ranking performance of KNN by combining KNN with naive Bayes (simply NB). The idea is that a naive Bayes is learned using the k nearest neighbors of the test instance as the training data and used to classify the test instance. A critical problem in combining KNN with naive Bayes is the lack of training data when k is small. We propose to deal with it using cloning to expand the training data. That is, each of the k nearest neighbors is "cloned" and the clones are added to the training data. We call our new model instance cloning local naive Bayes (simply ICLNB). We conduct extensive empirical comparison for the related algorithms in two groups in terms of AUC, using the 36 UCI datasets recommended by Weka [2]. In the first group, we compare ICLNB with KNN, NB, NBTree[3], C4.4[4]. In the second group, we compare ICLNB with KNN, KNNDW and LWNB[5]. Our experimental results show that ICLNB outperforms all those algorithms significantly. From our study, we have two conclusions. First, KNN-relates algorithms performs well in ranking. Second, our new algorithm ICLNB performs best among the algorithms compared in this paper, and could be used in the applications in which an accurate ranking is desired. KNN-related algorithms, including KNN, KNNDW, LWNB.

References (12)

Aha, David W., Dennis Kibler, Marc K. Albert. 1991. Instance-Based Learning Algorithms. Machine Learning, vol. 6, pp. 37-66.
Kohavi, R.: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. Proceedings of the Second International Conference on Knowledge Dis- covery and Data Mining (KDD-96). AAAI Press (1996) 202-207
Provost, F. J., Domingos, P.: Tree Induction for Probability-Based Ranking. Ma- chine Learning 52(3) (2003) 199-215
Frank, E., Hall, M., Pfahringer, B.: Locally Weighted Naive Bayes. Proceedings of the Conference on Uncertainty in Artificial Intelligence (2003). Morgan Kauf- mann(2003), 249-256.
Provost, F., Fawcett, T.: Analysis and visualization of classifier performance: com- parison under imprecise class and cost distribution. Proceedings of the Third In- ternational Conference on Knowledge Discovery and Data Mining. AAAI Press (1997) 43-48
Hand, D. J., Till, R. J.: A simple generalisation of the area under the ROC curve for multiple class classification problems. Machine Learning 45 (2001) 171-186
Ling, C. X., Yan, R. J.: Decision Tree with Better Ranking. Proceedings of the 20th International Conference on Machine Learning. Morgan Kaufmann (2003) 480-487
Huang, J., Lu, J., Ling, C., X.: Comparing Naive Bayes, Decision Trees, and SVM with AUC and Accuracy. Proceedings of the Third IEEE International Conference on Data Mining. IEEE Computer Society Press(2003), 553-556.
Quinlan, J. R.: C4.5: Programs for Machine Learning. Morgan Kaufmann: San Mateo, CA (1993)
Zhang, H., Su, J.: Naive Bayesian Classifiers for Ranking. Proceeding of ECML 2004. Springer (2004) 501-512
Xie, Z., Hsu, W., Liu, Z., Lee, M.: SNNB: A Selective Neighborhood Based Naive Bayes for Lazy Learning. Proceedings of the Sixth Pacific-Asia Conference on KDD. Springer (2002) 104-114
Witten, I. H., Frank, E.: Data Mining -Practical Machine Learning Tools and Techniques with Java Implementation. Morgan Kaufmann (2000)

Learning k-Nearest Neighbor Naive Bayes for Ranking

Sign up for access to the world's latest research

Abstract

Related papers

References (12)

Related papers

Cited by