LIUM's SMT Machine Translation Systems for WMT 2012
Abstract
This paper describes the development of French–English and English–French statistical machine translation systems for the 2012 WMT shared task evaluation. We developed phrase-based systems based on the Moses decoder, trained on the provided data only. Additionally, new features this year included improved language and translation model adaptation using the cross-entropy score for the corpus selection.
References (14)
- Sadaf Abdul-Rauf and Holger Schwenk. 2009. On the use of comparable corpora to improve SMT perfor- mance. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 16- 23, Athens, Greece.
- Mauro Cettolo, Nicola Bertoldi, and Marcello Federico. 2011. Methods for smoothing the optimizer instability in SMT. In Proc. of Machine Translation Summit XIII, Xiamen, China.
- Qin Gao and Stephan Vogel. 2008. Parallel implemen- tations of word alignment tool. In Software Engi- neering, Testing, and Quality Assurance for Natural Language Processing, pages 49-57, Columbus, Ohio, June. Association for Computational Linguistics.
- Jianfeng Gao, Joshua Goodman, Mingjing Li, and Kai- Fu Lee. 2002. Toward a unified approach to statistical language modeling for chinese. In ACM Transactions on Asian Language Information Processing.
- Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrased-based machine translation. In HLT/NACL, pages 127-133.
- Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Con- stantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In ACL, demonstration session.
- Patrik Lambert, Holger Schwenk, Christophe Servan, and Sadaf Abdul-Rauf. 2011. Investigations on translation model adaptation using monolingual data. In Sixth Workshop on SMT, pages 284-293.
- Robert C. Moore and William Lewis. 2010. Intelligent selection of language model training data. In Proceed- ings of the ACL 2010 Conference Short Papers.
- Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignement models. Computational Linguistics, 29(1):19-51.
- Franz Josef Och. 2003. Minimum error rate training in statistical machine translation. In Proc. of the Annual Meeting of the Association for Computational Linguis- tics, pages 160-167.
- Paul Ogilvie and Jamie Callan. 2001. Experiments using the Lemur toolkit. In In Proceedings of the Tenth Text Retrieval Conference (TREC-10), pages 103-108.
- Holger Schwenk, Patrik Lambert, Loïc Barrault, Christophe Servan, Sadaf Abdul-Rauf, Haithem Afli, and Kashif Shah. 2011. Lium's smt machine trans- lation systems for WMT 2011. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 464-469, Edinburgh, Scotland, July. Associa- tion for Computational Linguistics.
- Holger Schwenk. 2007. Continuous space language models. Computer Speech and Language, 21:492- 518. Holger Schwenk. 2008. Investigations on large- scale lightly-supervised training for statistical machine translation. In IWSLT, pages 182-189.
- A. Stolcke. 2002. SRILM: an extensible language mod- eling toolkit. In Proc. of the Int. Conf. on Spoken Lan- guage Processing, pages 901-904, Denver, CO.