Academia.eduAcademia.edu

Outline

Statistical post-editing for a statistical MT system

2011

Abstract

Statistical post-editing (SPE) techniques have been successfully applied to the output of Rule Based MT (RBMT) systems. In this paper we investigate the impact of SPE on a standard Phrase-Based Statistical Machine Translation (PB-SMT) system, using PB-SMT both for the first-stage MT and the second stage SPE system. Our results show that, while a naive approach to using SPE in a PB-SMT pipeline produces no or only modest improvements, a novel combination of source context modelling and thresholding can produce statistically significant improvements of 2 BLEU points over baseline using technical translation data for French to English.

References (16)

  1. Chris Callison-Burch, Phillip Koehn, Josh Schroeder, and Christof Monz. 2009. Findings of the 2009 workshop on statistical machine translation. In Proceedings of the 4th EACL Workshop on Statistical Machine Translation, pages 1-28, Athens, Greece.
  2. Loic Dugast, Jean Senellart, and Phillip Koehn. 2007. Statistical post-editing on systran's rule-based translation system. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 220-223, Prague.
  3. Loic Dugast, Philipp Koehn, and Jean Senellart. 2009. Statistical post-editing and dictionary extraction: Systran/edinburgh submissions for ACL-WMT2009. In Systran/Edinburgh submissions for ACL-WMT2009.
  4. Teramusa Ehara. 2007. Rule based machine translation combined with statistical post-editor for Japanese to English patent translation. Tokyo University of Science, Suwas.
  5. Philipp Koehn, Franz Och, and Daniel Marcu. 2003. Statistical Phrase-Based Translation. In Proceedings of the 2003 Human Language Technology Conference and the North American Chapter of the Association for Computational Linguistics, pages 48-54, Edmonton, AB, Canada.
  6. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Demo and Poster Sessions of the 45th Annual Meeting of the Association for Computational Linguistics (ACL07), pages 177-180. Prague, Czech Republic.
  7. Percy Liang, Alexandre Bouchard-Cote, Dan Klein, and Ben Taskar. 2006. An end-to-end discriminative approach to machine translation. In Proceedings of the 21st International Conference on Computational Linguistics and 44th annual meeting of the ACL, pages 761-768. Sydney Australia.
  8. Yanjun Ma, Yifan He, Andy Way, and Josef van Genabith. 2011. Consistent translation using discriminative learning: A translation memory-inspired approach. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics, pages 1239-1248. Portland, OR.
  9. Eric W. Noreen. 1989. Computer intensive methods for testing hypotheses: An introduction.
  10. Franz Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19-51.
  11. Franz Och. 2003. Minimum Error Rate Training in Statistical Machine Translation. In 41st Annual Meeting of the Association for Computational Linguistics, pages 160-167, Sapporo, Japan. Kemal Oflazer and Ilknur Durgar El-Kahlout. 2007. Exploring different representational units in english-to-turkish statistical machine translation. In Proceedings of the Second Workshop on Statistical Machine Translation, pages 25-32. Prague.
  12. Michel Simard, Cyril Goutte, and Pierre Isabelle. 2007a. Statistical phrase-based post-editing. In Proceedings of NAACL HLT 2007, pages 508-515. Rochester, NY.
  13. Michel Simard, Pierre Isabelle, and Cyrill Goutte. 2007b. Domain adaptation of MT systems through automatic post-editing. In MT Summit XI, pages 225-261, Copenhagen, Denmark.
  14. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate and targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, pages 223-231. Cambridge, MA.
  15. Andreas Stolcke. 2002. SRILM -An extensible language modeling toolkit. In Proceedings of the International Conference on Spoken Language Processing, pages 901-904, Denver, CO.
  16. Nicolas Stroppa and Karolina Owczarzak. 2007. A cluster-based representation for multi-system MT evaluation.