A new word sense disambiguation system based on deduction
Abstract
Word sense ambiguity resolution is one of the major issues in the process of machine translation. Statistical and example-based methods are usually applied for this purpose. In statistical methods, ambiguity resolution is mostly carried out by making use of some statistics extracted from previously translated documents or dual corpora of source and target languages. In this paper, we look at the problem from a different viewpoint. The proposed system consists of two main parts. The first part includes a data mining algorithm which runs offline and extracts some useful knowledge about the cooccurrences of the words. The second part of the system is an expert system whose knowledge base includes the set of association rules generated by the first part. For the inference engine of the expert system, we propose an efficient algorithm based on forward chaining in order to deduce the correct senses of the words. The performance of the system in terms of applicability and precision will be analyzed and discussed through a set of experiments.
References (19)
- Statistical Post-Editing of a Rule-Based Machine Translation System, Proceedings of NAACL HLT 2009: Short Papers, pages 217-220, Boulder, Colorado, June 2009. c 2009 Association for Computational Linguistics, pp 217-224
- Arabic/English Word Translation Disambiguation using Parallel Corpora and Matching Schemes, th EAMT conference, 22-23 September 2008, Hamburg, Germany, pp 6-11.
- On the use of Comparable Corpora to improve SMT performance Sadaf Abdul-Rauf and Holger Schwenk, Proceedings of the 12th Conference of the European Chapter of the ACL, pages 16-23, Athens, Greece, 30 March -3 April 2009. c 2009 Association for Computational Linguistics, pp 16-23
- Gale, K. Church, and D. Yarowsky.: A Method for Disambiguating Word Senses in a Large Corpus. Computers and Humanities, vol. 26, pp. 415-439 (1992).
- Yarowsky.: Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 88-95 (1994).
- T. Ng and H. B. Lee.: Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-based Approach. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 40-47 (1996)
- Mangu and E. Brill.: Automatic rule acquisition for spelling correction. In Proceedings of the 14th International Conference on Machine Learning pp. 187-194.H. Poor, An Introduction to Signal Detection and Estimation. New York: Springer-Verlag, 1985, ch. 4.
- R. Golding and D. Roth. A Winnow-Based Approach to Context- Sensitive Spelling Correction. Machine Learning, vol. 34, pp. 107- 130 (1999).
- Escudero, Gerard, Lluís Màrquez & German Rigau.: Boosting applied to word sense disambiguation. Proceedings of the 12th European Conference on Machine Learning (ECML), Barcelona, Spain, 129- 141 (2000).
- T. Pedersen.: A simple approach to building ensembles of Naive Bayesian classifiers for word sense disambiguation. In Proceedings of the 1st Annual Meeting of the North American Chapter of the Association for Computation Linguistics, pp. 63-69, Seattle, WA (2000)
- Brown, P. F., DellaPietra, S. A., DellaPietra, V. J., and Mercer, R. L. ( 1991). Word Sense Disambiguation Using Statistical Methods. Proceedings. Annual Meeting of the Association for Computational Linguistics, pp. 264-70.
- Yarowsky, D. (1992). Word Sense Disambiguation Using Statistical Models of Roget's Categories Trained on Large Corpora. Proceedings of 15th International Conference on Computational Linguistics, pp.454-60.
- Dagan, I. and Itai, A. (1994). Word sense disambiguation using a second language monolingual corpus. Association for Computational Linguistics, 20(4): 563-96.
- Justeson, J. J. and Katz, S. M. (1995). Principled disambiguation: discriminating adjective senses with modified nouns. Computational Linguistics, 21(1): 1-28.
- Ng, H. T. and Lee, H. B. (1996). Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach. Proceeding of the 34th Annual Meeting of the Association for Computational Linguistics (ACL-96), Santa Cruz.
- T. M. Miangah and A. D. Khalafi, Word Sense Disambiguation Using Target Language Corpus in a Machine Translation Systems, Literary and Linguistic Computing, Vol. 20, No. 2, 2005
- Schütze, H.: Automatic WS discrimination. Computational Linguistics, 24(1):97-124 (1998)
- Brin, S., Motwani, R., Silverstein, C., Beyond market baskets: generalizing association rules to correlations, In Proc. of the 1997 ACM SIGMOD international conference on Management of data, pp.265-276, Tucson, Arizona, United States, 1997.
- S.M. Fakhrahmad, M.H. Sadreddini and M. Zolghadri Jahromi, " Mining Frequent Itemsets in Large Data Warehouses: A Novel Approach Proposed for Sparse Data Sets", The 8th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'07), Birmingham, U.K , pp. 517-526, 16-19 December 2007. Proceedings of the World Congress on Engineering 2011 Vol II WCE 2011, July 6 -8, 2011, London, U.K.