Academia.eduAcademia.edu

Outline

Large Scale Translation Quality Estimation

2015

Abstract

This study explores methods for developing a large scale Quality Estimation framework for Machine Translation. We expand existing resources for Quality Estimation across related languages by using different transfer learning methods. The transfer learning methods are: Transductive SVM, Label Propagation and Self-taught Learning. We use transfer learning methods on the available labelled datasets, e.g. en-es, to produce a range of Quality Estimation models for Romance languages, while also adapting for subtitling as a new domain. The Self-taught Learning method shows the most promising results among the used techniques.

References (18)

  1. Eleftherios Avramidis. 2014. Efforts on machine learning over human-mediated translation edit rate. In Proceed- ings of the Ninth Workshop on Statistical Machine Translation, pages 302-306, Baltimore, Maryland, USA, June.
  2. Daniel Beck, Kashif Shah, and Lucia Specia. 2014. Shef-lite 2.0: Sparse multi-task gaussian processes for translation quality estimation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 307-312, Baltimore, Maryland, USA, June.
  3. James Bergstra and Yoshua Bengio. 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res., 13:281-305, February.
  4. Ergun Bicici and Andy Way. 2014. Referential translation machines for predicting translation quality. In Pro- ceedings of the Ninth Workshop on Statistical Machine Translation, pages 313-321, Baltimore, Maryland, USA, June.
  5. Alexandra Birch, Miles Osborne, and Philipp Koehn. 2008. Predicting Success in Machine Translation. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 745-754, Honolulu, Hawaii.
  6. Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Aleš Tamchyna. 2014. Findings of the 2014 workshop on statistical machine translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 12-58, Baltimore, Maryland, USA, June. Association for Computational Linguistics.
  7. José Guilherme Camargo de Souza, Jesús González-Rubio, Christian Buck, Marco Turchi, and Matteo Negri. 2014. Fbk-upv-uedin participation in the wmt14 quality estimation shared-task. In Proceedings of the Ninth Workshop on Statistical Machine Translation, pages 322-328, Baltimore, Maryland, USA, June.
  8. José Guilherme Camargo de Souza, Matteo Negri, Elisa Ricci, and Marco Turchi. 2015. Online multitask learning for machine translation quality estimation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26-31, 2015, Beijing, China, Volume 1: Long Papers, pages 219-228.
  9. Philipp Koehn, Hieu Hoang, Alexandra Birch, Chris Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondřej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, ACL '07, pages 177-180, Stroudsburg, PA, USA.
  10. Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Trans. on Knowl. and Data Eng., 22(10):1345-1359, October.
  11. Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311-318. Association for Computational Linguistics.
  12. Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, and Andrew Y. Ng. 2007. Self-taught learning: Trans- fer learning from unlabeled data. In Proceedings of the 24th International Conference on Machine Learning, ICML '07, pages 759-766, New York, NY, USA.
  13. Lucia Specia, Marco Turchi, Nicola Cancedda, Marc Dymetman, and Nello Cristianini. 2009. Estimating the sentence-level quality of machine translation systems. In Proc 13th Conference of the European Association for Machine Translation, pages 28-37.
  14. Lucia Specia, Kashif Shah, Jose G.C. de Souza, and Trevor Cohn. 2013. Quest -a translation quality estimation framework. In 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, ACL, pages 79-84, Sofia, Bulgaria.
  15. Marco Turchi and Matteo Negri. 2014. Automatic annotation of machine translation datasets with binary quality judgements. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Reykjavik, Iceland, may.
  16. Vladimir N. Vapnik. 1995. The Nature of Statistical Learning Theory. Springer-Verlag New York, Inc., New York, NY, USA.
  17. Yi Yang and Jacob Eisenstein. 2015. Unsupervised multi-domain adaptation with feature embeddings. In Pro- ceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguis- tics: Human Language Technologies, pages 672-682, Denver, Colorado, May-June.
  18. Xiaojin Zhu and Zoubin Ghahramani. 2002. Learning from labeled and unlabeled data with label propagation. In online.