Academia.eduAcademia.edu

Outline

Approach to Hypertext Categorization

2008

Abstract

Hypertext/text domains are characterized by several tens or hundreds of thousands of features. This represents a challenge for supervised learning algorithms which have to learn accurate classifiers using a small set of available training examples. In this paper, a fuzzy semi-supervised support vector machines (FSS-SVM) algorithm is proposed. It tries to overcome the need for a large labelled training set. For this, it uses both labelled and unlabelled data for training. It also modulates the effect of the unlabelled data in the learning process. Empirical evaluations with two real-world hypertext datasets showed that, by additionally using unlabelled data, FSS-SVM requires less labelled training data than its supervised version, support vector machines, to achieve the same level of classification performance. Also, the incorporated fuzzy membership values of the unlabelled training patterns in the learning process have positively influenced the classification performance in compari...

References (17)

  1. Proceedings of the AAAI Symposium on Machine Learning in Information Access. Proceedings of the workshop on Speech and Natural Language: 212-217.
  2. Vapnik, V. N. (1998). Statistical learning theory, Wiley New York. 200-209.
  3. A Fuzzy Semi-Supervised Support Vector Machines 4 Conclusion
  4. Liere, R. and P. Tadepalli (1996). "The use of active learning in text categorization."
  5. Lewis, D. D. (1992). "Feature selection and feature extraction for text categorization."
  6. Joachims, T. (1999). "Transductive inference for text classification using support vector machines." Proceedings of the Sixteenth International Conference on Machine Learning:
  7. Bennett, K. and A. Demiriz (1998). "Semi-supervised support vector machines." Advances in Neural Information Processing Systems 11: 368-374. University of Wisconsin at Madison, Madison, WI. Networks for Signal Processing IX, 1999. Proceedings of the 1999 IEEE Signal Processing Society Workshop: 3-11. on Neural Networks 13(2): 464-471. 13th Workshop on: 517-526.
  8. Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms, Kluwer Academic Publishers Norwell, MA, USA. 203. 890 Innovative Techniques and Applications of Artificial Intelligence, Cambridge, December 2004, pp. 258-268. ISBN 1-85233-907-1
  9. Fung, G. and O. Mangasarian (1999). "semi-supervised support vector machines for unlabeled data classification." (Technical Report 99-05). Data mining Institute,
  10. Zhang, X. (1999). "Using class-center vectors to build support vector machines." Neural training data with outliers." Pattern Recognition Letters 24(14): 2479-2487.
  11. Cao, L. J., H. P. Lee, et al. (2003). "Modified support vector novelty detector using
  12. Lin, C. F. and S. D. Wang (2002). "Fuzzy support vector machines." IEEE Transactions
  13. Sheng-de Wang, C. L. (2003). "Training algorithms for fuzzy support vector machines with noisy data." Neural Networks for Signal Processing, 2003. NNSP'03. 2003 IEEE
  14. Bensaid, A. M., L. O. Hall, et al. (1996). "Partially supervised clustering for image segmentation." Pattern Recognition 29(5), 859-871.
  15. Guyon, I., N. Matic, et al. (1996). "Discovering informative patterns and data cleaning." Advances in knowledge discovery and data mining table of contents: 181-
  16. Sinka, M. P. and D. W. Corne (2002). "A large benchmark dataset for web document clustering." Soft Computing Systems: Design, Management and Applications 87: 881-
  17. Benbrahim, H. and M. Bramer (2004). "Neighbourhood Exploitation in Hypertext Categorization." In Proceedings of the Twenty-fourth SGAI International Conference on