Academia.eduAcademia.edu

Outline

A novel distance-based classifier built on pattern ranking

2009

https://doi.org/10.1145/1529282.1529602

Abstract

Abstract Instance-based classifiers that compute similarity between instances suffer from the presence of noise in the training set and from over-fitting. In this paper we propose a new type of distance-based classifier that instead of computing distances between instances computes the distance between each test instance and the classes. Both are represented by patterns in the space of the frequent itemsets. We ranked the itemsets by metrics of itemset significance.

References (21)

  1. REFERENCES
  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in large databases. In Proc. VLDB'94.
  3. D. Aha and D. Kibler. Instance-based learning algorithms. Machine Learning, 6:37-66, 1991.
  4. B. Bigi. Using K-L distance for text categorization. Advances in Information Retrieval, 2633:76, 2003.
  5. Hong Cheng, Xifeng Yan, Jiawei Han, and Chih-Wei Hsu. Discriminative frequent pattern analysis for effective classification. ICDE, 0:716-725, 2007.
  6. W. Cohen. Fast effective rule induction. Proc. Int. Conf. Machine Learning, pages 115-123, 1995.
  7. Figure 7: Misclassification at different levels of noise.
  8. T. M. Cover and P. E. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13:21-27, 1967.
  9. Pedro Domingos. Unifying instance-based and rule-based induction. Machine Learning, 24(2):141-168, 1996.
  10. H. Fan and K. Ramamohanarao. Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Trans. Knowl. Data Eng., 18(6):721-737, 2006.
  11. Usama M. Fayyad and Keki B. Irani. Multi-interval discretization of continuous valued attributes for classification learning. Proc. IJCAI'93, pp. 1022-1027.
  12. S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and K.R.K. Murthy. Improvements to Platt's SMO algorithm for SVM classifier design. Neural Computation, 13(3):637-649, 2001.
  13. Ron Kohavi. The power of decision tables. In Proc. ECML'95, LNAI 914, pp. 174-189, Springer Verlag.
  14. S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79-86, 1951.
  15. Wenmin Li, Jiawei Han, and Jian Pei. CMAR: Accurate and efficient classification based on multiple class-association rules. In ICDM, Int. Conf. Data Mining, pages 369-376, 2001.
  16. Bing Liu, Wynne Hsu, and Yiming Ma. Integrating classification and association rule mining. In SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pages 80-86, 1998.
  17. R. Meo. Theory of dependence values. ACM TODS, 45(3), 2000.
  18. Dimitris Meretakis and Beat Wüthrich. Extending Naïve Bayes classifiers using long itemsets. In Proc. KDD'99, pages 165-174, 1999.
  19. R. F. Sproull. Refinements to nearest-neighbor searching in k-dimensional trees. Algorithmica, 6(1-6):579-589, 1991.
  20. T. Steinbach and Kumar. Introduction to Data Mining. Pearson education, 2006.
  21. D. Randall Wilson and Tony R. Martinez. Reduction techniques for instance-based learning algorithms. Mach. Learn., 38(3):257-286, 2000.