Academia.eduAcademia.edu

Outline

Resources for Nepali Word Sense Disambiguation

2008

Abstract

Word sense disambiguation (WSD) is a process of identifying proper meaning of words that may have multiple meanings. It is regarded as one of the most challenging problems in the field of natural language processing (NLP). Nepali Language also has words that have multiple meanings, thus giving rise to the problem of WSD in it. In this paper, we investigate the impact of NLP resources like morphology analyzer (MA) and machine readable dictionary (MRD) in ambiguity resolution. Our results show that the accuracy in WSD is better with the availability of NLP resources like morph analyzer, MRD etc. Lesk algorithm has been used to solve WSD problem using a sample Nepali WordNet containing few sets of Nepali nouns and the system is able to disambiguate these nouns only. The system was tested on a small set of data with limited number of nouns. The accuracy reading was between 50% - 70% depending on the sample data provided. When the same data was tested through manual morph analysis, the accuracy was seen to be considerably high (80%).

References (17)

  1. Ide, I. and J. Veronis, Word Sense Disambiguation: The state of the art. Computational Linguistics, 1998: p. 1-41.
  2. Fellbaum, C. WordNet: An Electronic Lexical Da- tabase. 1998. Cambridge: MIT Press.
  3. Lesk, M.E. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. in Proceedings of the SIGDOC Conference. 1986. Toronto, Ontario, Canada.
  4. Stevenson, M. and Y. Wilks, The interaction of knowledge sources in word sense disambiguation. Computational Linguistics, 2001: p. 321-349.
  5. Pedersen, T. and S. Banerjee. An adapted lesk al- gorithm for word sense disambiguation using WordNet. in Proceedings of the Third International Conference on Intelligent Text Processing and Computational Linguistics. 2002. Gelbukh.
  6. Manish, S., et al., Hindi Word Sense Disambigua- tion. 2004: Department of Computer Science & Engineering, IIIT, Mumbai, India,.
  7. Bombay, I. Hindi Wordnet from Center for Indian Language Technology Solutions. 2006 [cited 2006; http://www.cfilt.iitb.ac.in/wordnet/webhwn/].
  8. 8. Walker, D.E. Knowledge resource tools for access- ing large text files. in Sergei Nirenburg (ed.), Ma- chine Translation: Theoretical and methodological issues. 1987: Cambridge University Press.
  9. Weiss, S., Learning to disambiguate. Information Storage and Retrieval, 1973. 9: p. 33-41.
  10. Kelly Edward, F. and j.S. Philip, Computer Recog- nition of English Word Senses. 1975: North- Holland, Amsterdam,.
  11. Black and Ezra, An Experiment in Computational Discrimination of English Word Senses. IBM Jour- nal of Research and Development, 1988. 32(2): p. 185-194.
  12. Brown, P.F., et al. Word sense disambiguation us- ing statistical methods. in Proceedings of the 29th Annual Meeting of Association for Computational Linguistics. 1991. Berkeley, California.
  13. Gale, W.A., K.W. Church, and D. Yarowsky. Us- ing bilingual materials to develop word sense dis- ambiguation methods. in Proceedings of the Inter- national Conference on Theoretical and Methodo- logical Issues in Machine Translation. 1992.
  14. Dagan, I., Alon Itai, and U. Schwall. Two languag- es are more informative than one. in Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. 1991. Berkeley, Cali- fornia.
  15. Dagan, I. and A. Itai., Word sense disambiguation using a second language monolingual corpus. Computational Linguistics, 1994. 20: p. 563-596.
  16. Luk, A. Statistical sense disambiguation with rela- tively small corpora using dictionary definitions. in in Proceedings of the 33rd Meeting of the Associa- tion for Computational Linguistics ACL-95. 1995. Cambridge.
  17. Gaustad, T., Linguistic Knowledge and Word Sense Disambiguation. 2004.