Domain adaptation of MT systems through automatic post-editing
2007
Abstract
It is generally acknowledged that the performance of rulebased machine translation (RMBT) systems can be greatly improved through domain-specific system adaptation. To that end, RBMT users often choose to invest significant resources into the development of ad hoc MT dictionaries. In this paper, we demonstrate that comparable customization effects can be achieved automatically. One effective way to do that is to post-edit the translations produced by a vanilla RBMT system using a specially-trained statistical machine translation (SMT) system. Our experiments indicate that this method is just as effective as manual customization of system dictionaries in reducing the need for manual postediting.
References (15)
- Jeffrey Allen and Christofer Hogan. 2000. Toward the de- velopment of a post-editing module for Machine Trans- lation raw output: a new productivity tool for processing controlled language. In Third International Controlled Language Applications Workshop (CLAW2000), Wash- ington, USA.
- Peter F Brown, Stephen A Della Pietra, Vincent J Della Pietra, and Robert L Mercer. 1993. The Mathematics of Statistical Machine Translation: Parameter Estimation. Computational Linguistics, 19(2):263-311.
- Jakob Elming. 2006. Transformation-based corrections of rule-based MT. In Proceedings of the EAMT 11th Annual Conference, Oslo, Norway.
- George Foster, Roland Kuhn, and Howard Johnson. 2006. Phrasetable smoothing for statistical machine transla- tion. In Proceedings of EMNLP 2006, pages 53-61, Sydney, Australia.
- Richard Kittredge and John Lehrberger, editors. 1983. Studies of Language in Restricted Domains. Walter De- Gruyter.
- Kevin Knight and Ishwar Chander. 1994. Automated postediting of documents. In Proceedings of National Conference on Artificial Intelligence, pages 779-784, Seattle, USA.
- Philipp Koehn, Franz J. Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of HLT-NAACL 2003, pages 127-133, Edmonton, Canada.
- Philipp Koehn. 2004. Pharaoh: a beam search decoder for phrase-based statistical machine translation models. In Proceedings of AMTA 2004, pages 115-124, Washing- ton, USA.
- Daniel Marcu and William Wong. 2002. A phrase-based, joint probability model for statistical machine transla- tion. In Proceedings of EMNLP 2002, Philadelphia, USA.
- Franz Josef Och. 2003. Minimum error rate training in Statistical Machine Translation. In Proceedings of ACL- 2003, pages 160-167, Sapporo, Japan.
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In ACL-02, editor, Proc. ACL'02.
- Fatiha Sadat, Howard Johnson, Akakpo Agbago, George Foster, Roland Kuhn, Joel Martin, and Aaron Tikuisis. 2005. PORTAGE: A phrase-based machine translation system. In Proceedings of the ACL Workshop on Build- ing and Using Parallel Texts, pages 129-132, Ann Ar- bor, USA.
- Michel Simard, Cyril Goutte, and Pierre Isabelle. 2007a. Statistical phrase-based post-editing. In Human Lan- guage Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pages 508-515, Rochester, USA, April. Association for Com- putational Linguistics.
- Michel Simard, Nicola Ueffing, Pierre Isabelle, and Roland Kuhn. 2007b. Rule-based translation with statistical phrase-based post-editing. In Proceedings of the Sec- ond Workshop on Statistical Machine Translation, pages 203-206, Prague, Czech Republic, June. Association for Computational Linguistics.
- Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of trans- lation edit rate with targeted human annotation. In Pro- ceedings of AMTA-2006, Cambridge, USA.