CONCEPT-BASED INDEXING IN TEXT INFORMATION RETRIEVAL
https://doi.org/10.5121/IJCSIT.2013.5110Abstract
Traditional information retrieval systems rely on keywords to index documents and queries. In such systems, documents are retrieved based on the number of shared keywords with the query. This lexical-focused retrieval leads to inaccurate and incomplete results when different keywords are used to describe the documents and queries. Semantic-focused retrieval approaches attempt to overcome this problem by relying on concepts rather than on keywords to indexing and retrieval. The goal is to retrieve documents that are semantically relevant to a given user query. This paper addresses this issue by proposing a solution at the indexing level. More precisely, we propose a novel approach for semantic indexing based on concepts identified from a linguistic resource. In particular, our approach relies on the joint use of WordNet and WordNetDomains lexical databases for concept identification. Furthermore, we propose a semantic-based concept weighting scheme that relies on a novel definition of concept centrality. The resulting system is evaluated on the TIME test collection. Experimental results show the effectiveness of our proposition over traditional IR approaches.
References (34)
- Rocchio, J. J. (1971). Relevance feedback in information retrieval. In The SMART Retrieval System, in Experiments in Automatic Document Processing. G.Salton editor, Prentice-Hall, Englewood Cliffs, NJ,pp. 313-323.
- G. Bordogna and G. Pasi," A fuzzy linguistic approach generalizing Boolean information retrieval: a model and its evaluation," in Journal of the American Society for Information Science, 44(2), 70-82, 1993.
- D. A. Buell and D. H. Kraft, " A model for a weighted retrieval system," in Journal of the American Society for Information Science, 32(3), 211-216,1981.
- Salton G, Buckley C (1988). "Term-weighting approaches in automatic text retrieval". Information Processing and Management 24 (5): 513-523.
- Mauldin, M., Carbonell J. and Thomason R., (1987). Beyond the keyword bariier: knowledge-based information retrieval. Information services and use 7(4-5): 103-117.
- G. Miller (1995) WordNet : A Lexical database for English.. Actes de ACM 38, pp. 39-41.
- Luisa Bentivogli, Pamela Forner, Bernardo Magnini and Emanuele Pianta. "Revising WordNet Domains Hierarchy: Semantics, Coverage, and Balancing", in COLING 2004 Workshop on "Multilingual Linguistic Resources", Geneva, Switzerland, August 28, 2004, pp. 101-108.
- Christiane Fellbaum (1998, ed.) WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
- C. Leacock and M. Chodorow. Combining local context and WordNet similarity for word sense identification. In C. Fellbaum Ed., MIT Press, (1998), pp. 265-283.
- Zhibiao Wu and Martha Palmer. 1994. Verb semantics and lexical selection. In 32nd. Annual Meeting of the Association for Computational Linguistics, pages 133-138, New Mexico State University, Las Cruces, New Mexico.
- P. Resnik. Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language, Journal of Artificial Intelligence Research (JAIR), 11, 1999, (p. 95-130).
- D. Lin. (1998) An information-theoretic definition of similarity. In Proceedings of 15th International Conference On Machine Learning, 1998.
- Jay J. Jiang and David W. Conrath. 1997. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of International Conference on Research in Computational Linguistics, Taiwan.
- A. Budanitsky , G. Hirst. EvaluatingWordNet-based Measures of Lexical Semantic Relatedness. 2006 Association for Computational Linguistics.
- Lung-Hao Lee, Yu-Ting Yu, Chu-Ren Huang: Chinese WordNet Domains: Bootstrapping Chinese WordNet with Semantic Domain Labels. In Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 2009, Hong Kong, China, December 3-5, 2009.,p. 288-296.
- Styltsvig H. B. Ontology-based information retrieval. Ph.D. thesis, Dept. Computer Science, Roskilde University, Denmark.
- Egozi, O., Markovitch, S., and Gabrilovich, E.: Concept-Based Information Retrieval using Explicit Semantic Analysis. ACM Transactions on Information Systems, Volume 29 Issue 2, April 2011.
- B.Y. Kang and S.J. Lee. Document indexing: a concept-based approach to term weight estimation. In Journal of Information Processing & Management. Volume 41, Issue 5, September 2005, Pages 1065- 1080.
- Jeremy Pickens, W. Bruce Croft: An exploratory analysis of phrases in text retrieval. RIAO 2000 (Recherche d'Information assiste par Ordinateur): 1179-1195.
- M. Sussna. Word sense disambiguation for free-text indexing using a massive semantic network. 2nd International Conference on Information and Knowledge Management (CIKM-1993), 67-74.
- Resnik, P. Disambiguating noun groupings with respect to WordNet senses. 3thWorkshop on Very Large Corpora, 54-68. (1995).
- E. M. Voorhees. Using WordNet to disambiguate word senses for text retrieval. Association for Computing Machinery Special Interest Group on Information Retrieval. (ACM-SIGIR-1993) : 16thAnnual International Conference on Research and Development in Information Retrieval, 171- 180. (1993).
- M. Baziz, M. Boughanem, N. Aussenac-Gilles. A Conceptual Indexing Approach based on Document content Representation. Dans : CoLIS5 : Fifth International Conference on Conceptions of Libraries and Information Science, Glasgow, UK, 4 juin 8 juin 2005. F. Crestani, I. Ruthven (Eds.), Lecture Notes in Computer Science LNCS Volume 3507/2005, Springer-Verlag, Berlin Heidelberg, p. 171- 186.
- Boubekeur F., Boughanem M., Tamine L., Daoud M., «Using WordNet for Concept-based document indexing in information retrieval», Fourth International Conference on Semantic Processing (SEMAPRO), Florence, Italy, October 2010.
- F. Boubekeur, M. Boughanem, L. Tamine. Exploiting association rules and ontology for semantic document indexing. Dans: 12th International conference IPMU08, Information Processing and Management of Uncertainty in knowledge-Based Systems, Malaga, 22-27, june 08, Spain.
- O. Uzuner, B. Katz, D. Yuret : Word Sense Disambiguation for Information Retrieval. AAAI/IAAI 1999 : 985.
- L.R. Khan, D. Mc Leod, E.Hovy. Retrieval effectiveness of an ontology-based model for information selection. The VLDB Journal (2004)13 :71-85.
- S. G. Kolte, S. G. Bhirud. Word Sense Disambiguation using WordNetDomains. In First International Conference on Emerging Trends in Engineering and Technology. 2008 IEEE DOI 10.1109/ICETET.2008.231.
- Boubekeur F., Azzoug W., Chiout S., Boughanem M., «Indexation sémantique de documents textuels», 14e Colloque International sur le Document Electronique (CIDE14), Rabat, Maroc, Décembre 2011.
- Harrathi F., Roussey C., Maisonnasse L., Calabretto S., « Vers une approche statistique pour l'indexation sémantique des documents multilingues », Actes du XXVIII° congrès INFORSID, Marseille, France, mai 2010.
- M. Boughanem, I. Mallak, H. Prade. A new factor for computing the relevance of a document to a query (regular paper). Dans : IEEE World Congress on Computational Intelligence (WCCI 2010), Barcelone, 18/07/2010-23/07/2010, 2010.
- D. Dinh, L.Tamine. Vers un modèle d'indexation sémantique adapté aux dossiers médicaux de patients (short paper). Dans : Conférence francophone en Recherche d'Information et Applications (CORIA 2010), Sousse, Tunisie, 18/03/2010-21/03/2010, Hermès, Mars 2010.
- M.E. Lesk, Automatic sense disambiguation using machine readable dictionaries : How to tell a pine cone from a nice cream cone. In Proceedings of the SIGDOC Conference. Toronto, 1986.
- S.E. Robertson, The probability ranking principle in IR. Journal of Documentation 33, 294-304 (1977). Reprinted in: K. Sparck Jones and P. Willett (eds), Readings in Information Retrieval. Morgan Kaufmann, 1997. (pp 281-286).