Abstract
In this paper, the "Weighted Overlapping" Disambiguation method is presented and evaluated. This method extends the Lesk's approach to disambiguate a specific word appearing in a context (usually a sentence). Sense's definitions of the specific word, "Synset" definitions, the "Hypernymy" relation, and definitions of the context features (words in the same sentence) are retrieved from the WordNet database and used as an input of our Disambiguation algorithm. More precisely, for each sense of the word a sense bag is formed using the WordNet definition and the definitions of all the "Hypernyms" associated with the nouns and verbs in the sense's definition. A similar technique is used, for all the context words and the definitions of the "Hypernyms" (associated with the context nouns and verbs), to form a context bag. Then, a technique of assigning weights to words is applied. The weight for every word is inversely proportional to the hierarchy depth in the WordNet taxonomy of the associated "synset". Eventually, the disambiguation of a word in a context is based on the calculation of the similarity between the words of the sense bags and the context bag. The proposed method is evaluated in disambiguating all the nouns for all the sentences in the Brown files.
References (3)
- Aggire & Rigau 96] Agirre E. and Rigau G. (1996). Word Sense Disambiguation Using Conceptual Density, 16th International Conference on COLING. Copenhagen. [Brill 92] Brill E. (1992). A simple rule based part of speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing, ACL, 1992. [Budanitsky & Graeme 01] Budanitsky A., Graeme H. (2001). Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures. In Workshop on the WordNet and Other Lexical Resources. The North American Chapter of the Association for Computation Linguistics (NAACL-2000, Pittsburgh, PA, June 2001. [Cowie et al. 92] Cowie J., Cuthrie J. and Guthrie L. (1992). Lexical disambiguation using simulated annealing. In Proc. DARPA Workshop on Speech and Natural Language. pp. 238-242, New York. [Fellbaum 98] Fellbaum, C. (1998). WordNet, an Electronic Lexical Database. MIT Press, Cambridge MA, 1998.
- Landes et al. 98] Landes S., Leacock C. and Tengi R. 1998. Building semantic concordances. In WordNet, an Electronic Lexical Database, pages 199--216. MIT Press, Cambridge MA, 1998. [Leacock & Chodorow 98] Leacock C., Chodorow M. (1998). Combining Local Context and WordNet5 Similarity for Word Sense Disambiguation. In Wordnet: An Electronic Lexical Database, pages 265-283. MIT Press, Cambridge MA, 1998. [Lee et al. 93] Lee J. H., Kim H. and Lee Y. J. (1993). Information retrieval based on conceptual distance in IS-A hierarchies. In Journal of Documentation, 49(2): 188-207.
- Lesk M. (1986). Automatic sense disambiguation: How to tell a pine cone from an ice cream cone. In Proceedings of the 1986 SIGDOC Conference, Pages 24-26, New York. Association of Computing Machinery [Mihalcea & Moldovan 99] Mihalcea R. and Moldovan Dan. (1999). Automatic Acquisition of Sense tagged Corpora. In American Association for Artificial Intelligence [Montoyo & Palomar 01] Montoyo A. and Palomar M. (2001). Specification Marks for Word Sense Disambiguation: New Development. A. Gelbukh (Ed.): In CICLing 2001, LNCS 2004. pp. 182-191, 2001. [Porter 80] Porter M. F. (1980). An algorithm for suffix stripping Sparck Jones, Karen, and Peter Willet, 1997, Readings in Information Retrieval, San Francisco: Morgan Kaufmann, ISBN 1-55860- 454-4. [Sussna 93] Sussna M. (1993) Word sense disambiguation for free-test indexing using a massive semantic network. In Proceedings of the 2nd International Conference on Information and Knowledge Management, Arlington. Virginia, USA. [Voorhess 93] Voorhees E. (1993) Using WordNet to Disambiguate Word Senses for Text Retrieval. In Proceedings of 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 171-180, Pittsburgh, PA.