Academia.eduAcademia.edu

Outline

A soft approach to hybrid models for document clustering

Abstract

In document clustering the usability of the classical approaches is limited by several shortcomings. In this work it is proposed a new model to document clustering based on a conceptual representation of documents and on a hybrid model to perform the clustering process. We use the FIS-CRM model for generating the conceptual representation of documents. The clustering procedure is implemented by two connected and tailored algorithms with the aim to build a fuzzyhierarchical structure. We use a fuzzy hierarchical clustering algorithm to determine an initial clustering and the process is completed using an improved soft clustering algorithm. Experiments show that if we use this model, clustering tends to perform better than the classical approach.

References (23)

  1. A.K. Jain, M.N. Murty, P.J. Flynn, "Data clustering: A review", ACM Comput. Surv. 31 (3) (1999) 264-323.
  2. A.K. Jain and R.C. Dubes. "Algorithms for Clustering Data", Prentice Hall, Englewood Cliffs NJ, U.S.A.,1988.
  3. E.Voorhees. "The cluster hypothesis revisited". In SIGIR, 1985.
  4. F. Beil, M. Ester, X. Xu, "Frequent Term- Based Clustering", Proceedings of the SIGKDD'02, Edmonton, Canada, 2002.
  5. G. Akrivas, M. Wallace, G. Andreou, G. Stamou and S. Kollias, "Context -Sensitive Semantic Query Expansion", Proceedings of the IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia, September 2002.
  6. G.A. Miller. "WordNet: A lexical database for English", Communications of the ACM 11, 39-41, (1995)
  7. G. Salton and M. J. McGill. "Introduction to Modern Information Retrieval". McGraw- Hill, 1983.
  8. H. Spath, "Clustering Analysis Algorithms for Data Reduction and Classification of Objects", Ellis Horwood, Chichester, 1980.
  9. H. Uchida, M. Zhu, Senta T. Della. "UNL: A Gift for a Millennium". The United Nations University, 1995.
  10. H. Liu, P. Singh. "ConceptNet: A Practical Commonsense Reasoning
  11. Toolkit". BT Technology Journal vol. 22 No 4. 2004.
  12. J.A. Olivas, P. Garcés, F.P. Romero: "An application of the FIS-CRM model to the FISS metasearcher: Using fuzzy synonymy and fuzzy generality for representing concepts in documents". International Journal of Approximate Reasoning (Soft Computing in Recognition and Search) 34, pp. 201-219, 2003.
  13. L. King-ip, K. Ravikumar: "A similarity-based soft clustering algorithm for documents". Proc. of the Seventh Int. Conf. on Database Sys. for Advanced Applications 2001.
  14. L. Kaufman and P. J. Rousseeuw, "Finding Groups in Data: an Introduction to Cluster Analysis", John Wiley and Sons, 1990.
  15. N. Slonim and N. Tishby.
  16. "Agglomerative Information Bottleneck". In Proc. of Neural Information Processing Systems (NIPS-99), pages 617-623, 1999.
  17. R. Barrett and T. Selker. "AIM: A new approach for meeting information needs". Technical report, IBM Research, 1995.
  18. R. Bell. "Analytic Issues in the Use of Repertory Grid Technique". Advances in Personal Construct Psychology 1, pp. 25-48. 1990.
  19. R. Yager, "On Ordered Weithted Averaging Aggregation Operations in Multicriteria Decision making". IEEE Transactions on Systems, Man and Cybernetics 18, pp. 183-190, 1988.
  20. T. Kohonen, "Self-organizing Maps, Series in Information Sciences", vol. 30, Springer, 1995.
  21. W. Pedrycz, "Conditional Fuzzy C- Means", Pattern Recognition Letters, Vol.17, pp. 625-631. 1996.
  22. Y. Yang , "An Evaluation of Statistical Approaches to Text Categorization", Journal of Information Retrieval, Vol 1, No. 1/2, pp. 67-88, 1999
  23. Y. Zhao and G. Karypis. "Evaluation of hierarchical clustering algorithms for document datasets". In Proceedings of CIKM, pp. 515-524. ACM Press, 2002.