Learning Naive Bayes for Probability Estimation by Feature Selection
Naive Bayes is a well-known effective and efficient classification algorithm. But its probability... more Naive Bayes is a well-known effective and efficient classification algorithm. But its probability estimation is poor. In many applications, however, accurate probability estimation is often required in order to make optimal decisions. Usually, probability estimation is measured by conditional log likelihood (CLL). There have been some learning algorithms proposed recently to extend naive Bayes for high CLL, such as ERL [8, 9] and BNC-2P [10]. Unfortunately, their computational complexity is relatively high. Is there a simple but effective and efficient approach to improve the probability estimation of naive Bayes? In this paper, we propose to use feature selection for this purpose. More precisely, a search process is conducted to select a subset of attributes, and then a naive Bayes is deployed on the selected attribute set. In fact, feature selection has been successfully applied to naive Bayes and achieves significant improvement in classification accuracy. Among the feature selection algorithms for naive Bayes, selective Bayesian classifiers (SBC) by Langley et al.[13] demonstrates good performance. In this paper, we first study the performance of SBC in terms of probability estimation, and then propose an improved SBC algorithm SBC-CLL, in which the CLL score is directly used for attribute selection, instead of using classification accuracy. Our experiments show that both SBC and SBC-CLL achieve significant improvement over naive Bayes, and that SBC-CLL outperforms SBC substantially, in probability estimation measured by CLL. Our work provides an efficient and surprisingly effective approach to improve the probability estimation of naive Bayes.
CirCUs is a SAT solver based on the DPLL procedure and conflict clause recording [7, 5, 2]. CirCU... more CirCUs is a SAT solver based on the DPLL procedure and conflict clause recording [7, 5, 2]. CirCUs includes most current popular techniques such as two-watched literals scheme for Boolean Constraint Propagation (BCP), activity-based decision heuristics, clause deletion strategies, restarting heuristics, and first UIP-based clause learning. In this submission we focus on the search for a balance between the ability of a technique to detect implications (the deductive power of [3]) and its cost.
We propose a new switching criterion, namely the evenness or unevenness of the distribution of va... more We propose a new switching criterion, namely the evenness or unevenness of the distribution of variable weights, and use this criterion to combine intensification and diversification in local search for SAT. We refer to the ways in which state-of-the-art local search algorithms adaptG 2 W SAT P and V W select a variable to flip, as heuristic adaptG 2 W SAT P and heuristic V W , respectively. To evaluate the effectiveness of this criterion, we apply it to heuristic adaptG 2 W SAT P and heuristic V W , in which the former intensifies the search better than the latter, and the latter diversifies the search better than the former. The resulting local search algorithm, which switches between heuristic adaptG 2 W SAT P and heuristic V W in every step according to this criterion, is called Hybrid. Our experimental results show that, on a broad range of SAT instances presented in this paper, Hybrid inherits the strengths of adaptG 2 W SAT P and V W , and exhibits generally better performance than adaptG 2 W SAT P and V W . In addition, Hybrid compares favorably with state-of-the-art local search algorithm R+adaptN ovelty+ on these instances. Furthermore, without any manual tuning parameters, Hybrid solves each of these instances in a reasonable time, while adaptG 2 W SAT P , V W , and R+adaptN ovelty+ have difficulty on some of these instances.
The instance-based k-nearest neighbor algorithm (KNN)[1] is an effective classification model. It... more The instance-based k-nearest neighbor algorithm (KNN)[1] is an effective classification model. Its classification is simply based on a vote within the neighborhood, consisting of k nearest neighbors of the test instance. Recently, researchers have been interested in deploying a more sophisticated local model, such as naive Bayes, within the neighborhood. It is expected that there are no strong dependences within the neighborhood of the test instance, thus alleviating the conditional independence assumption of naive Bayes. Generally, the smaller size of the neighborhood (the value of k), the less chance of encountering strong dependences. When k is small, however, the training data for the local naive Bayes is small and its classification would be inaccurate. In the currently existing models, such as LWNB [3], a relatively large k is chosen. The consequence is that strong dependences seem unavoidable. In our opinion, a small k should be preferred in order to avoid strong dependences. We propose to deal with the problem of lack of local training data using sampling (cloning). Given a test instance, clones of each instance in the neighborhood is generated in terms of its similarity to the test instance and added to the local training data. Then, the local naive Bayes is trained from the expanded training data. Since a relatively small k is chosen, the chance of encountering strong dependences within the neighborhood is small. Thus the classification of the resulting local naive Bayes would be more accurate. We experimentally compare our new algorithm with KNN and its improved variants in terms of classification accuracy, using the 36 UCI datasets recommended by Weka [8], and the experimental results show that our algorithm outperforms all those algorithms significantly and consistently at various k values.
Accurate probability-based ranking of instances is crucial in many real-world data mining applica... more Accurate probability-based ranking of instances is crucial in many real-world data mining applications. KNN (k-nearest neighbor) [1] has been intensively studied as an effective classification model in decades. However, its performance in ranking is unknown. In this paper, we conduct a systematic study on the ranking performance of KNN. At first, we compare KNN and KNNDW (KNN with distance weighted) to decision trees and naive Bayes in ranking, measured by AUC (the area under the Receiver Operating Characteristics curve). Then, we propose to improve the ranking performance of KNN by combining KNN with naive Bayes (simply NB). The idea is that a naive Bayes is learned using the k nearest neighbors of the test instance as the training data and used to classify the test instance. A critical problem in combining KNN with naive Bayes is the lack of training data when k is small. We propose to deal with it using cloning to expand the training data. That is, each of the k nearest neighbors is "cloned" and the clones are added to the training data. We call our new model instance cloning local naive Bayes (simply ICLNB). We conduct extensive empirical comparison for the related algorithms in two groups in terms of AUC, using the 36 UCI datasets recommended by Weka [2]. In the first group, we compare ICLNB with KNN, NB, NBTree[3], C4.4[4]. In the second group, we compare ICLNB with KNN, KNNDW and LWNB[5]. Our experimental results show that ICLNB outperforms all those algorithms significantly. From our study, we have two conclusions. First, KNN-relates algorithms performs well in ranking. Second, our new algorithm ICLNB performs best among the algorithms compared in this paper, and could be used in the applications in which an accurate ranking is desired. KNN-related algorithms, including KNN, KNNDW, LWNB.
Accurate probability estimation generated by learning models is desirable in some practical appli... more Accurate probability estimation generated by learning models is desirable in some practical applications, such as medical diagnosis. In this paper, we empirically study traditional decision-tree learning models and their variants in terms of probability estimation, measured by Conditional Log Likelihood (CLL). Furthermore, we also compare decision tree learning with other kinds of representative learning: naïve Bayes, Naïve Bayes Tree, Bayesian Network, K-Nearest Neighbors and Support Vector Machine with respect to probability estimation. From our experiments, we have several interesting observations. First, among various decision-tree learning models, C4.4 is the best in yielding precise probability estimation measured by CLL, although its performance is not good in terms of other evaluation criteria, such as accuracy and ranking. We provide an explanation for this and reveal the nature of CLL. Second, compared with other popular models, C4.4 achieves the best CLL. Finally, CLL does not dominate another wellestablished relevant measurement AUC (the Area Under the Curve of Receiver Operating Characteristics), which suggests that different decision-tree learning models should be used for different objectives. Our experiments are conducted on the basis of 36 UCI sample sets that cover a wide range of domains and data characteristics. We run all the models within a machine learning platform -Weka.
Accurate ranking, measured by AUC (the area under the ROC curve), is crucial in many real-world a... more Accurate ranking, measured by AUC (the area under the ROC curve), is crucial in many real-world applications. Most traditional learning algorithms, however, aim only at high classification accuracy. It has been observed that traditional decision trees produce good classification accuracy but poor probability estimates. Since the ranking generated by a decision tree is based on the class probabilities, a probability estimation tree (PET) with accurate probability estimates is desired in order to yield high AUC. Some researchers ascribe the poor probability estimates of decision trees to the decision tree learning algorithms. To our observation, however, the representation also plays an important role. In this paper, we propose to extend decision trees to represent a joint distribution and conditional independence, called conditional independence trees (CITrees), which is a more suitable model for yielding high AUC. We propose a novel AUC-based algorithm for learning CITrees, and our experiments show that the CITree algorithm outperforms the state-of-the-art decision tree learning algorithm C4.4 (a variant of C4.5), naive Bayes, and NBTree in AUC. Our work provides an effective model and algorithm for applications in which an accurate ranking is required.
Naive Bayes has been widely used in data mining as a simple and effective classification algorith... more Naive Bayes has been widely used in data mining as a simple and effective classification algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, among which tree augmented naive Bayes (TAN) [3] achieves a significant improvement in term of classification accuracy, while maintaining efficiency and model simplicity. In many real-world data mining applications, however, an accurate ranking is more desirable than a classification. Thus it is interesting whether TAN also achieves significant improvement in term of ranking, measured by AUC(the area under the Receiver Operating Characteristics curve) [8,1]. Unfortunately, our experiments show that TAN performs even worse than naive Bayes in ranking. Responding to this fact, we present a novel learning algorithm, called forest augmented naive Bayes (FAN), by modifying the traditional TAN learning algorithm. We experimentally test our algorithm on all the 36 data sets recommended by Weka [12], and compare it to naive Bayes, SBC [6], TAN [3], and C4.4 [10], in terms of AUC. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate rankings. Our work provides an effective and efficient data mining algorithm for applications in which an accurate ranking is required.
Page 1. u NUMERIC MAPPING AND LEARNABILITY OF NAIVE BAYES HARRY ZHANG Faculty of Computer Science... more Page 1. u NUMERIC MAPPING AND LEARNABILITY OF NAIVE BAYES HARRY ZHANG Faculty of Computer Science, University of New Brunswick, Fredericton, New Brunswick, Canada CHARLES X. LING Department of Computer ...
An accurate ranking of instances based on their class probabilities, which is measured by AUC (ar... more An accurate ranking of instances based on their class probabilities, which is measured by AUC (area under the Receiver Operating Characteristics curve), is desired in many applications. In a traditional decision tree, two obstacles prevent it from yielding accurate rankings: one is that the sample size on a leaf is small, and the other is that the instances falling into the same leaf are assigned to the same class probability. In this paper, we propose two techniques to address these two issues. First, we use the statistical technique shrinkage which estimates the class probability of a test instance by using a linear interpolation of the local class probabilities on each node along the path from leaf to root. An efficient algorithm is also brought forward to learn the interpolating weights. Second, we introduce an instance-based method, the weighted probability estimation (WPE), to generate distinct local probability estimates for the test instances falling into the same leaf. The key idea is to assign different weights to training instances based on their similarities to the test instance in probability estimation. Furthermore, we combine shrinkage and WPE together to compensate for the defects of each. Our experiments show that both shrinkage and WPE improve the ranking performance of decision trees, and that their combination works even better. The experiments also indicate that various decision tree algorithms with the combination of shrinkage and WPE significantly outperform the original ones and other state-of-the-art techniques proposed to enhance the ranking performance of decision trees.
Journal of Experimental and Theoretical Artificial Intelligence, 2008
It is well known that naive Bayes performs surprisingly well in classification, but its probabili... more It is well known that naive Bayes performs surprisingly well in classification, but its probability estimation is poor. AUC (the area under the receiver operating characteristics curve) is a measure different from classification accuracy and probability estimation, which is often used to measure the quality of rankings. Indeed, an accurate ranking of examples is often more desirable than a mere classification. What is the general performance of naive Bayes in yielding optimal ranking, measured by AUC? In this paper, we study it systematically by both empirical experiments and theoretical analysis. In our experiments, we compare naive Bayes with a state-of-the-art decision-tree learning algorithm C4.4 for ranking, and some popular extensions of naive Bayes which achieve a significant improvement over naive Bayes in classification, such as the selective Bayesian classifier (SBC) and tree-augmented naive Bayes (TAN). Our experimental results show that naive Bayes performs significantly better than C4.4 and comparably with TAN. This provides empirical evidence that naive Bayes performs well in ranking. Then we analyse theoretically the optimality of naive Bayes in ranking. We study two example problems: conjunctive concepts and m-of-n concepts, which have been used in analysing the performance of naive Bayes in classification. Surprisingly, naive Bayes performs optimally on them in ranking, even though it does not in classification. We present and prove a sufficient condition for the optimality of naive Bayes in ranking. From both empirical and theoretical studies, we believe that naive Bayes is a competitive model for ranking. A preliminary version of this paper appeared in ECML2004
Recent natural language processing (NLP) research shows that identifying and extracting subjectiv... more Recent natural language processing (NLP) research shows that identifying and extracting subjective information from texts can benefit many NLP applications. In this paper, we address a semi-supervised learning approach, self-training, for sentence subjectivity classification. In self-training, the confidence degree that depends on the ranking of class membership probabilities is commonly used as the selection metric that ranks and selects the unlabeled instances for next training of underlying classifier. Naive Bayes (NB) is often used as the underlying classifier because its class membership probability estimates have good ranking performance. The first contribution of this paper is to study the performance of self-training using decision tree models, such as C4.5, C4.4, and naive Bayes tree (NBTree), as the underlying classifiers. The second contribution is that we propose an adapted Value Difference Metric (VDM) as the selection metric in self-training, which does not depend on class membership probabilities. Based on the Multi-Perspective Question Answering (MPQA) corpus, a set of experiments have been designed to compare the performance of self-training with different underlying classifiers using different selection metrics under various conditions. The experimental results show that the performance of self-training is improved by using VDM instead of the confidence degree, and self-training with NBTree and VDM outperforms self-training with other combinations of underlying classifiers and selection metrics. The results also show that the self-training approach can achieve comparable performance to the supervised learning models.
Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted
K-Nearest-Neighbor (KNN) has been widely used in classification problems. However, there exist th... more K-Nearest-Neighbor (KNN) has been widely used in classification problems. However, there exist three main problems confronting KNN according to our observation: 1) KNN’s accuracy is degraded by a simple vote; 2) KNN’s accuracy is typically sensitive to the value of K; 3) KNN’s accuracy may be dominated by some irrelevant attributes. In this paper, we presented an improved algorithm called Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted (DKNAW) . We experimentally tested its accuracy, using the whole 36 UCI data sets selected by Weka[1], and compared it to NB, KNN, KNNDW, and LWNB[2]. The experimental results show that DKNAW significantly outperforms NB, KNN, and KNNDW and slightly outperforms LWNB.
There is growing interest in scaling up the widely-used decision-tree learning algorithms to very... more There is growing interest in scaling up the widely-used decision-tree learning algorithms to very large data sets. Although numerous diverse techniques have been proposed, a fast tree-growing algorithm without substantial decrease in accuracy and substantial increase in space complexity is essential. In this paper, we present a novel, fast decision-tree learning algorithm that is based on a conditional independence assumption. The new algorithm has a time complexity of O(m · n), where m is the size of the training data and n is the number of attributes. This is a significant asymptotic improvement over the time complexity O(m · n 2 ) of the standard decision-tree learning algorithm C4.5, with an additional space increase of only O(n). Experiments show that our algorithm performs competitively with C4.5 in accuracy on a large number of UCI benchmark data sets, and performs even better and significantly faster than C4.5 on a large number of text classification data sets. The time complexity of our algorithm is as low as naive Bayes'. Indeed, it is as fast as naive Bayes but outperforms naive Bayes in accuracy according to our experiments. Our algorithm is a core tree-growing algorithm that can be combined with other scaling-up techniques to achieve further speedup.
In real-world data mining applications, an accurate ranking is as important as an accurate classi... more In real-world data mining applications, an accurate ranking is as important as an accurate classification. Naive Bayes has been widely used in data mining as a simple and effective classification and ranking algorithm. Since its conditional independence assumption is rarely true, numerous algorithms have been proposed to improve naive Bayes, for example, SBC[1] and TAN[2]. Indeed, the experimental results show that SBC and TAN achieve a significant improvement in term of classification accuracy. However, unfortunately, our experiments also show that SBC and TAN perform even worse than naive Bayes in ranking measured by AUC[3,4](the area under the Receiver Operating Characteristics curve). This fact raises the question of whether we can improve Naive Bayes with both accurate classification and ranking? In this paper, responding to this question, we present a new learning algorithm called One Dependence Augmented Naive Bayes(ODANB). Our motivation is to develop a new algorithm to improve Naive Bayes’ performance not only on classification measured by accuracy but also on ranking measured by AUC. We experimentally tested our algorithm, using the whole 36 UCI datasets recommended by Weka[5], and compared it to Naive Bayes, SBC and TAN. The experimental results show that our algorithm outperforms all the other algorithms significantly in yielding accurate ranking, yet at the same time outperforms all the other algorithms slightly in terms of classification accuracy.
Uploads
Papers by Harry Zhang