Predicting MT Quality as a Function of the Source Language
2006
Abstract
This paper describes one phase of a large-scale machine translation (MT) quality assurance project. We explore a novel approach to discriminating MT-unsuitable source sentences by predicting the expected quality of the output. 1 The resources required include a set of source/MT sentence pairs, human judgments on the output, a source parser, and an MT system. We extract a number of syntactic, semantic, and lexical features from the source sentences only and train a classifier that we call the "
References (17)
- References
- Martin, L.E. (1990). Knowledge Extraction. In Proceedings of the Twelfth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Lawrence Erlbaum Associates, pp. 252-262.
- Aikawa, T, M. Melero, L. Schwartz, and A. Wu. (2001). Sentence generation for multilingual machine translation. In Proceedings of the MT Summit VIII, Santiago de Compostela, Spain.
- Banerjee, Satanjeev and Alon Lavie. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. ACL'05 Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.
- Blatz, John, Eric Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, Nicola Ueffing. (2004). Confidence estimation for machine translation. In Proceedings of COLING 2004, pp. 315-321.
- Corston-Oliver, Simon, Michael Gamon, Eric Ringger, and Robert C. Moore. (2002). An overview of Amalgam: A machine-learned generation module. In Proceedings of the International Natural Language Generation (INLG) Conference. New York, USA. pp. 33-40.
- Gamon, Michael, Anthony Aue and Martine Smets. (2005). Sentence-Level MT evaluation without reference translations: beyond language modeling In Proceedings of EAMT 2005.
- Heidorn, G. (2000). Intelligent writing assistance. In R.Dale, H.Moisl and H.Somers (eds.), A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text. New York: Marcel Dekker, pp. 181-207.
- Liu, Ding and Daniel Gildea. (2005). Syntactic features for evaluation of machine translation. ACL'05 Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization.
- Menezes, Arul and Stephen D. Richardson. (2001). A best-first alignment algorithm for automatic extraction of transfer mappings from bilingual corpora. In Proceedings of the Workshop on Data-driven Machine Translation at 39th Annual Meeting of the Association for Computational Linguistics, Toulouse, France, pp. 39-46.
- Nyberg, E. H. and T. Mitamura. (1996). Controlled language and knowledge-based machine translation: Principles and practice, In Proceedings of the First International Workshop on Controlled Language Applications.
- Papineni, Kishore A., Salim Roukos, Todd Ward and Wei-Jing Zhu. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of ACL 2002, pp. 311-318.
- Quirk, Christopher. (2004). Training a sentence-level machine translation confidence measure. In Proceedings of LREC 2004, pp 825-828.
- Rajman, Martin and Tony Hartley. (2001). Automatically predicting MT systems rankings compatible with Fluency, Adequacy or Informativeness scores. In Proceedings of the MT Summit VIII, Santiago de Compostela, Spain.
- Reuther, U. (2003). Two in one -Can it work? Readability and translatability by means of controlled language. Controlled Language Application Workshop (CLAW-03), pp. 124-132.
- Richardson, Steve (2004). Machine translation of online product support articles using a data-driven MT system. In Proceedings of AMTA 2004, pp. 246-251.
- Uchimoto, K., N. Hayashida, T. Ishida, H. Isahara. (2005). Automatic rating of machine translatability. In Proceedings of MT Summit X, pp. 235-242.