Linear discriminant analysis in document classification
2001
Abstract
Document representation using the bag-of-words approach may require bringing the dimensionality of the representation down in order to be able to make effective use of various statistical classification methods. Latent Semantic Indexing (LSI) is one such method that is based on eigendecomposition of the covariance of the document-term matrix. Another often used approach is to select a small number of most important features out of the whole set according to some relevant criterion. This paper points out that LSI ignores discrimination while concentrating on representation. Furthermore, selection methods fail to produce a feature set that jointly optimizes class discrimination. As a remedy, we suggest supervised linear discriminative transforms, and report good classification results applying these to the Reuters-21578 database.
Key takeaways
AI
AI
- Linear Discriminant Analysis (LDA) significantly reduces classification error rates compared to Latent Semantic Indexing (LSI).
- LDA achieves a classification error rate of 7.8% using 64 dimensions from document representations.
- The study evaluates LDA's application on the Reuters-21578 database, consisting of 6535 training and 2570 test documents.
- Feature selection methods often fail to jointly optimize class discrimination, which LDA addresses.
- LDA facilitates efficient feature extraction, reducing computational complexity in high-dimensional document classification.
References (30)
- Michael W. Berry. Large scale singular value computations. International Journal of Super- computer Applications, 6(1), 1992.
- Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, Oxford, New York, 1995.
- William Campbell, Kari Torkkola, and Sree Balakrishnan. Dimension reduction techniques for training polynomial networks. In Proceedings of the 17th International Conference on Machine Learning, pages 119-126, Stanford, CA, USA, June 29 -July 2 2000.
- Soumen Chakrabarti, Byron Dom, Rakesh Agrawal, and Prabhakar Raghavan. Scalable fea- ture selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies. VLDB Journal: Very Large Data Bases, 7(3):163-178, 1998.
- Ronan Collobert and Samy Bengio. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research, 1:143-160, 2001.
- Sanjoy Dasgupta. Experiments with random projection. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pages 143-151, Stanford, CA, June30 -July 3 2000.
- S. Deerwester, S.T. Dumais, T.K. Landauer, G.W. Furnas, and R.A. Harshman. Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6):391-407, 1990.
- I.S. Dhillon, D.S. Modha, and W.S. Spangler. Visualizing class structure of multidimensional data. In Proceedings of the 30th Symposium of on the Interface, Computing Science, And Statistics, pages 488-493, Minneapolis, MN, USA, May 1998. Interface Foundation of North America.
- J.H. Friedman and J.W. Tukey. A projection pursuit algorithm for exploratory data analysis. IEEE Trans. on Computers, C-23:881, 1974.
- K. Fukunaga. Introduction to statistical pattern recognition (2nd edition). Academic Press, New York, 1990.
- Mark Girolami, Andrzej Cichocki, and Shun-Ichi Amari. A common neural network model for unsupervised exploratory data analysis and independent component analysis. IEEE Trans- actions on Neural Networks, 9(6):1495 -1501, November 1998.
- Xuan Guorong, Chai Peiqi, and Wu Minhui. Bhattacharyya distance feature selection. In Proceedings of the 13th International Conference on Pattern Recognition, volume 2, pages 195 -199. IEEE, 25-29 Aug. 1996.
- Eui-Hong (Sam) Han, George Karypis, and Vipin Kumar. Text categorization using weight adjusted k-nearest neighbor classification. In Proc. PAKDD, 2001.
- David Hull. Using statistical testing in the evaluation of retrieval performance. In Proc. of the 16th ACM/SIGIR Conference, pages 329-338, 1993.
- David Hull. Improving text retrieval for the routing problem using latent semantic indexing. In Proc. SIGIR'94, pages 282-291, Dublin, Ireland, July 3-6 1994.
- Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. In Claire Nédellec and Céline Rouveirol, editors, Proceedings of ECML- 98, 10th European Conference on Machine Learning, pages 137-142, Chemnitz, DE, 1998. Springer Verlag, Heidelberg, DE.
- Ata Kabán and Mark Girolami. Fast extraction of semantic features from a latent semantic indexed corpus. Neural Processing Letters, 15(1), 2002.
- G. Karypis and E. Sam. Concept indexing: A fast dimensionality reduction algorithm with ap- plications to document retrieval and categorization. Technical Report TR-00-0016, University of Minnesota, Department of Computer Science and Engineering, 2000.
- Thomas Kolenda, Lars Kai Hansen, and Sigurdur Sigurdsson. Indepedent components in text. In M. Girolami, editor, Advances in Independent Component Analysis. Springer-Verlag, 2000.
- Daphne Koller and Mehran Sahami. Toward optimal feature selection. In Proceedings of ICML-96, 13th International Conference on Machine Learning, pages 284-292, Bari, Italy, 1996.
- N. Kumar and A. G. Andreou. Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Communcation, 26:283-297, 1998.
- Huan Liu and Hiroshi Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, 1998.
- Dunja Mladenic. Feature subset selection in text-learning. In European Conference on Ma- chine Learning, pages 95-100, 1998.
- T. Okada and S. Tomita. An optimal orthonormal system for discriminant analysis. Pattern Recognition, 18(2):139-144, 1985.
- Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, and Santosh Vempala. La- tent semantic indexing: A probabilistic analysis. Journal of Computer and System Sciences, 61(2):217-235, 2000.
- George Saon and Mukund Padmanabhan. Minimum bayes error feature selection for contin- uous speech recognition. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13, pages 800-806. MIT Press, 2001.
- Hinrich Schütze, David Hull, and Jan O. Pedersen. A comparison of classifiers and document representations for the routing problem. In Proc. SIGIR'95, 1995.
- Kari Torkkola and William Campbell. Mutual information in learning feature transformations. In Proceedings of the 17th International Conference on Machine Learning, pages 1015-1022, Stanford, CA, USA, June 29 -July 2 2000.
- H. Yang and J. Moody. Data visualization and feature selection: New algorithms for nongaus- sian data. In Proceedings NIPS'99, Denver, CO, USA, November 29 -December 2 1999.
- Yiming Yang and Jan O. Pedersen. A comparative study on feature selection in text categoriza- tion. In Proc. 14th International Conference on Machine Learning, pages 412-420. Morgan Kaufmann, 1997.