Academia.eduAcademia.edu

Outline

Language-independent hybrid MT with PRESEMT

Abstract

The present article provides a comprehensive review of the work carried out on developing PRESEMT, a hybrid language-independent machine translation (MT) methodology. This methodology has been designed to facilitate rapid creation of MT systems for unconstrained language pairs, setting the lowest possible requirements on specialised resources and tools. Given the limited availability of resources for many languages, only a very small bilingual corpus is required, while language modelling is performed by sampling a large target language (TL) monolingual corpus. The article summarises implementation decisions, using the Greek-English language pair as a test case. Evaluation results are reported, for both objective and subjective metrics. Finally, main error sources are identified and directions are described to improve this hybrid MT methodology.

References (28)

  1. Paul E. Black. 2005. Dictionary of Algorithms and Data Structures. U.S. National Institute of Stan- dards and Technology (NIST).
  2. Chris Callison-Burch, Cameron Fordyce, Philipp Koehn, Christof Monz and Josh Schroeder. 2009. Further Meta-Evaluation of Machine Translation. Proceedings of the WMT-08 Workshop, Colom- bus, Ohio.
  3. Chris Callison-Burch, Philip Koehn, Christof Monz, Omar F. Zaidan. 2011. Findings of the 2011Workshop on Statistical Machine Translation. Proceedings of the 6 th Workshop on Statistical Machine Translation, Edinburgh, UK, pp. 22-64.
  4. Jaime Carbonell, Steve Klein, David Miller, Michael Steinbaum, Tomer Grassiany and Jochen Frey. 2006. Context-Based Machine Translation. Pro- ceedings of the 7 th AMTA Conference, Cam- bridge, MA, USA, pp. 19-28.
  5. Michael Carl, Maite Melero, Toni Badia, Vincent Vandeghinste, Peter Dirix, Ineke Schuurman, Stella Markantonatou, Sokratis Sofianopoulos, Marina Vassiliou and Olga Yannoutsou. 2008. METIS-II: Low Resources Machine Translation: Background, Implementation, Results and Poten- tials. Machine Translation, 22 (1-2):pp. 67-99.
  6. Helena M. Caseli, Maria das Graças V. Nunes and Mikel L. Forcada. 2008. Automatic Induction of Bilingual resources from aligned parallel corpora: Application to shallow-transfer machine transla- tion. Machine Translation, 20:pp. 227-245.
  7. Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. EMNLP 2011 Workshop on Statistical Machine Translation, Edinburgh, UK, pp. 85-91.
  8. Ioannis Dologlou, Stella Markantonatou, George Tambouratzis, Olga Yannoutsou, Athanasia Fourla and Nikos Ioannou. 2003. Using Monolingual Corpora for Statistical Machine Translation: The METIS System. Proceedings of the EAMT-CLAW 2003
  9. Workshop, Dublin, Ireland, pp. 61-68.
  10. Richard O. Duda, Peter E. Hart and David G. Scott. 2001. Pattern Classification (2 nd edition). Wiley Interscience, New York, U.S.A.
  11. David Gale and Lloyd S. Shapley. 1962. College Admissions and the Stability of Marriage. Ameri- can Mathematical Monthly, 69:pp. 9-14.
  12. John Hutchins. 2005. Example-Based Machine Translation: a Review and Commentary. Machine Translation, 19:pp. 197-211.
  13. Alexandre Klementiev, Ann Irvine, Chris Callison- Burch and David Yarowsky. 2012. Toward Statis- tical Machine Translation without Parallel Cor- pora. Proceedings of EACL2012, Avignon, France, 23-25 April, pp. 130-140.
  14. Philip Koehn. 2010. Statistical Machine Translation. Cambridge University Press, Cambridge.
  15. John Lafferty, Andrew McCallum and Fernando Pereira. 2001. Conditional Random Fields: Prob- abilistic Models for Segmenting and Labelling Se- quence Data. Proceedings of ICML 2011, Belle- vue, Washington, USA, pp. 282-289.
  16. Harry Mairson. 1992. The Stable Marriage Problem. The Brandeis Review, 12:1.
  17. Stella Markantonatou, Sokratis Sofianopoulos, Olga Giannoutsou and Marina Vassiliou. 2009. Hybrid Machine Translation for Low-and Middle-Den- sity Languages. Language Engineering for Lesser- Studied Languages, S. Nirenburg (ed.), IOS Press, pp. 243-274.
  18. NIST 2002. Automatic Evaluation of Machine Trans- lation Quality Using n-gram Co-occurrences Sta- tistics.
  19. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: A Method for Auto- matic Evaluation of Machine Translation. Pro- ceedings of the 40 th ACL Meeting, Philadelphia, USA, pp. 311-318.
  20. Jan Pomikálek and Pavel Rychlý. 2008. Detecting co- derivative documents in large text collections. Proceedings of LREC2008, Marrakech, Morrocco, pp.1884-1887.
  21. Prokopis Prokopidis, Byron Georgantopoulos and Harris Papageorgiou. 2011. A suite of NLP tools for Greek. Proceedings of the 10 th ICGL Confer- ence, Komotini, Greece, pp. 373-383.
  22. Felipe Sanchez-Martinez and Mikel L. Forcada. 2009. Inferring Shallow-transfer Machine transla- tion Rules from Small Parallel Corpora. Journal of Artificial Intelligence Research, 34:pp. 605-635.
  23. Helmut Schmid. 1994. Probabilistic Part-of-Speech Tagging Using Decision Trees. Proceedings of In- ternational Conference on New Methods in Lan- guage Processing, Manchester, UK, pp. 44-49.
  24. Temple F. Smith and Michael S. Waterman. 1981. Identification of Common Molecular Subse- quences. Journal of Molecular Biology, 147:195- 197.
  25. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla and John Makhoul. 2006. A Study of Translation Edit Rate with Targeted Hu- man Annotation. Proceedings of the 7 th AMTA Conference, Cambridge, MA, USA, pp. 223-231.
  26. Sokratis Sofianopoulos, Marina Vassiliou and George Tambouratzis. 2012. Implementing a language- independent MT methodology. Proceedings of the 1 st Workshop on Multilingual Modeling (held within the ACL-2012 Conference), Jeju, Republic of Korea, pp.1-10.
  27. Jinsong Su, Hua Wu, Haifeng Wang, Yidong Chen, Xiaodong Shi, Huailin Dong and Qun Liu. 2011. Translation Model Adaptation for Statistical Ma- chine Translation with Monolingual Topic Infor- mation. Proceedings of the 50 th ACL Meeting, Jeju, Republic of Korea, pp. 459-468.
  28. George Tambouratzis, Fotini Simistira, Sokratis Sofi- anopoulos, Nikos Tsimboukakis and Marina Vas- siliou. 2011. A resource-light phrase scheme for language-portable MT. Proceedings of the 15 th EAMT Conference, Leuven, Belgium, pp. 185- 192.