Statistical Approaches to Computer-Assisted Translation
2009, Computational Linguistics
Abstract
Universitat Jaume I Current machine translation (MT) systems are still not perfect. In practice, the output from these systems needs to be edited to correct errors. A way of increasing the productivity of the whole translation process (MT plus human work) is to incorporate the human correction activities within the translation process itself, thereby shifting the MT paradigm to that of computer-assisted translation. This model entails an iterative process in which the human translator activity is included in the loop: In each iteration, a prefix of the translation is validated (accepted or amended) by the human and the system computes its best (or n-best) translation suffix hypothesis to complete this prefix. A successful framework for MT is the so-called statistical (or pattern recognition) framework. Interestingly, within this framework, the adaptation of MT systems to the interactive scenario affects mainly the search process, allowing a great reuse of successful techniques and models. In this article, alignment templates, phrase-based models, and stochastic finite-state transducers are used to develop computer-assisted translation systems. These systems were assessed in a European project (TransType2) in two real tasks: The translation of printer manuals; manuals and the translation of the Bulletin of the European Union. In each task, the following three pairs of languages were involved (in both translation directions):
References (45)
- Amengual, J. C., J. M. Benedí, A. Casta ño, A. Castellanos, V. M. Jiménez, D. Llorens, A. Marzal, M. Pastor, F. Prat, E. Vidal, and J. M. Vilar. 2000. The EuTrans-I speech translation system. Machine Translation, 15:75-103.
- Amengual, J. C. and E. Vidal. 1998. Efficient error-correcting Viterbi parsing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(10):1109-1116.
- Bender, O., S. Hasan, D. Vilar, R. Zens, and H. Ney. 2005. Comparison of generation strategies for interactive machine translation. In Proceedings of the 10th Annual Conference of the European Association for Machine Translation (EAMT 05), pages 33-40, Budapest.
- Berger, A. L., P. F. Brown, S. A. Della Pietra, V. J. Della Pietra, J. R. Gillett, A. S. Kehler, and R. L. Mercer. 1996. Language translation apparatus and method of using context-based translation models. United States Patent No. 5510981, April.
- Berstel, J. 1979. Transductions and Context-Free Languages. B. G. Teubner, Stuttgart.
- Bisani, M. and H. Ney. 2004. Bootstrap estimates for confidence intervals in ASR performance evaluation. In Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP 04), volume 1, pages 409-412, Montreal.
- Bowker, L. 2002. Computer-Aided Translation Technology: A Practical Introduction, chapter 5: Translation-memory systems. Didactics of Translation. University of Ottawa Press, pages 92-127.
- Brown, P. F., J. Cocke, S. A. Della Pietra, V. J. Della Pietra, F. Jelinek, J. D. Lafferty, R. L. Mercer, and P. S. Roosin. 1990. A statistical approach to machine translation. Computational Linguistics, 16(2):79-85.
- Brown, P. F., S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263-310.
- Callison-Burch, C., M. Osborne, and P. Koehn. 2006. Re-evaluating the role of BLEU in machine translation research. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL 06), pages 249-256, Trento.
- Casacuberta, F., H. Ney, F. J. Och, E. Vidal, J. M. Vilar, S. Barrachina, I. García-Varea, D. Llorens, C. Martínez, S. Molau, F. Nevado, M. Pastor, D. Pic ó, A. Sanchis, and C. Tillmann. 2004a. Some approaches to statistical and finite-state speech-to-speech translation. Computer Speech and Language, 18:25-47.
- Casacuberta, F. and E. Vidal. 2004. Machine translation with inferred stochastic finite-state transducers. Computational Linguistics, 30(2):205-225.
- Casacuberta, F. and E. Vidal. 2007. Learning finite-state models for machine translation. Machine Learning, 66(1):69-91.
- Casacuberta, F., E. Vidal, and D. Pic ó. 2005. Inference of finite-state transducers from regular languages. Pattern Recognition, 38:1431-1443.
- Casacuberta, F., E. Vidal, A. Sanchis, and J. M. Vilar. 2004b. Pattern recognition approaches for speech-to-speech translation. Cybernetic and Systems: an International Journal, 35(1):3-17.
- Civera, J., J. M. Vilar, E. Cubel, A. L. Lagarda, S. Barrachina, E. Vidal, F. Casacuberta, D. Pic ó, and J. González. 2004a. From machine translation to computer assisted translation using finite-state models. In Proceedings of the Conference on Empirical Methods for Natural Language Processing (EMNLP 04), pages 349-356, Barcelona.
- Civera, J., J. M. Vilar, E. Cubel, A. L. Lagarda, S. Barrachina, F. Casacuberta, E. Vidal, D. Pic ó, and J. González. 2004b. A syntactic pattern recognition approach to computer assisted translation. In Advances in Statistical, Structural and Syntactical Pattern Recognition, Proceedings of the Joint IAPR International Workshops on Syntactical and Structural Pattern Recognition (SSPR 04) and Statistical Pattern Recognition (SPR 04)), Lisbon, Portugal, August 18-20, volume 3138 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, pages 207-215.
- Cubel, E., J. Civera, J. M. Vilar, A. L. Lagarda, S. Barrachina, E. Vidal, F. Casacuberta, D. Pic ó, J. González, and L. Rodríguez. 2004. Finite-state models for computer assisted translation. In Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 04), pages 586-590, Valencia.
- Cubel, E., J. González, A. Lagarda, F. Casacuberta, A. Juan, and E. Vidal. 2003. Adapting finite-state translation to the TransType2 project. In Proceedings of the Joint Conference Combining the 8th International Workshop of the European Association for Machine Translation and the 4th Controlled Language Applications Workshop 03), pages 54-60, Dublin.
- Foster, G. 2002. Text Prediction for Translators. Ph.D. thesis, Université de Montréal, Canada.
- Foster, G., P. Isabelle, and P. Plamondon. 1997. Target-text mediated interactive machine translation. Machine Translation, 12(1-2):175-194.
- Isabelle, P. and K. Church. 1997. Special issue on new tools for human translators. Machine Translation, 12(1-2).
- Jelinek, F. 1998. Statistical Methods for Speech Recognition. The MIT Press, Cambridge, MA. Jiménez, V. M. and A. Marzal. 1999. Computing the k shortest paths: a new algorithm and an experimental comparison. In Algorithm Engineering: Proceedings of the 3rd International Workshop (WAE 99), London, UK, July 19-21, volume 1668 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, pages 15-29.
- Kay, M. 1997. The proper place of men and machines in language translation. Machine Translation, 12:3-23. [This article first appeared as a Xerox PARC Working Paper in 1980].
- Khadivi, S. and C. Goutte. 2003. Tools for corpus alignment and evaluation of the alignments (deliverable d4.9). Technical report, TransType2 (IST-2001-32091).
- Khadivi, S., R. Zens, and H. Ney. 2006. Integration of speech to computer-assisted translation using finite-state automata. In Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics and 21th International Conference on Computational Linguistics (COLING/ACL 06), pages 467-474, Sydney.
- Khadivi, S., A. Zolnay, and H. Ney. 2005. Automatic text dictation in computer-assisted translation. In Proceedings of the European Conference on Speech Communication and Technology, (INTERSPEECH 05-EUROSPEECH), pages 2265-2268, Lisbon.
- Koehn, P. 2004. Statistical significance tests for machine translation evaluation. In Proceedings of the Conference on Empirical Methods for Natural Language Processing (EMNLP 04), pages 388-395, Barcelona.
- Koehn, P., F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proceedings of the 2003 Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL 03), pages 127-133, Edmonton.
- Langlais, P., G. Foster, and G. Lapalme. 2000. TransType: a computer-aided translation typing system. In Proceedings of the NAACL/ANLP Workshop on Embedded Machine Translation Systems, pages 46-52, Seattle, WA.
- Langlais, P., G. Lapalme, and M. Loranger. 2002. Transtype: Development-evaluation cycles to boost translator's productivity. Machine Translation, 15(4):77-98.
- Macklovitch, E. 2006. TransType2: The last word. In Proceedings of the 5th International Conference on Languages Resources and Evaluation (LREC 06), pages 167-172, Genoa.
- Macklovitch, E., N. T. Nguyen, and R. Silva. 2005. User evaluation report. Technical report, TransType2 (IST-2001-32091).
- Marcu, D. and W. Wong. 2002. A phrase-based, joint probability model for statistical machine translation. In Proceedings of the Conference on Empirical Methods for Natural Language Processing (EMNLP 02), pages 133-139, Philadelphia, PA.
- Ney, H., S. Nießen, F. Och, H. Sawaf, C. Tillmann, and S. Vogel. 2000. Algorithms for statistical translation of spoken language. IEEE Transactions on Speech and Audio Processing, 8(1):24-36.
- Och, F. J. 1999. An efficient method for determining bilingual word classes. In Proceedings of the 9th Conference of the European Chapter of the Association for
- Vidal, E. and F. Casacuberta. 2004. Learning finite-state models for machine translation. In Grammatical Inference: Algorithms and Applications, Proceedings of the 7th International Coloquium on Grammatical Inference (ICGI 04), Athens, Greece, October 11-13, volume 3264 of Lecture Notes in Artificial Intelligence. Springer, Heidelberg, pages 16-27.
- Vidal, E., F. Casacuberta, L. Rodríguez, J. Civera, and C. Martínez. 2006. Computer-assisted translation using speech recognition. IEEE Transactions on Speech and Audio Processing, 14(3):941-951.
- Vidal, E., F. Thollard, F. Casacuberta C. de la Higuera, and R. Carrasco. 2005. Probabilistic finite-state machines- part II. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(7):1025-1039.
- Whitelock, P. J., M. McGee Wood, B. J. Chandler, N. Holden, and H. J. Horsfall. 1986. Strategies for interactive machine translation: The experience and implications of the UMIST Japanese project. In Proceedings of the 11th International Conference on Computational Linguistics (COLING 86), pages 329-334, Bonn.
- Yamron, J., J. Baker, P. Bamberg, H. Chevalier, T. Dietzel, J. Elder, F. Kampmann, M. Mandel, L. Manganaro, T. Margolis, and E. Steele. 1993. LINGSTAT: an interactive, machine-aided translation system. In Proceedings of the Workshop on Human Language Technology, pages 191-195, Princeton, NJ.
- Zajac, R. 1988. Interactive translation: A new approach. In Proceedings of the 12th International Conference on Computational Linguistics (COLING 88), pages 785-790, Budapest.
- Zens, R. and H. Ney. 2004. Improvements in phrase-based statistical machine translation. In Proceedings of the Human Language Technology Conference / North American Chapter of the Association for Computational Linguistics Annual Meeting (HLT-NAACL 04), pages 257-264, Boston, MA.
- Zens, R., F. J. Och, and H. Ney. 2002. Phrase-based statistical machine translation. In Advances in Artificial Intelligence. 25th Annual German Conference on Artificial Intelligence (KI 02), Aachen, Germany, September 16-22, Proceedings, volume 2479 of Lecture Notes on Artificial Intelligence. Springer Verlag, Heidelberg, pages 18-32.
- Zhang, Y. and S. Vogel. 2004. Measuring confidence intervals for the machine translation evaluation metrics. In Proceedings of the Tenth International Conference on Theoretical and Methodological Issues in Machine Translation (TMI 04), pages 294-301, Baltimore, MD.