Academia.eduAcademia.edu

Outline

METIS-II: low resource machine translation

2008, Machine Translation

Abstract

METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use 'basic' linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their 'home' languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.

References (52)

  1. Abdelali, A., J. Cowie, S. Helmreich, W. Jin, M. P. Milagros, B. Ogden, H. M. Rad, and R. Zacharski: 2006, ' Guarani: a case study in resource development for quick ramp-up MT'. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, "Visions for the Future of Machine Translation". Cambridge, Massa-chusetts, USA, pp. 1-9.
  2. Alegria, I., A. D. de Ilarraza, G. Labaka, M. Lersundi, A. Mayor, K. Sarasola, M. L. Forcada, S. Ortiz-Rojas, and L. Padró: 2005, ' An open architecture for transfer- based machine translation between Spanish and Basque'. In: Proceedings of the X Machine Translation Summit workshop OSMaTran: Open-Source Machine Translation X. Phuket, Thailand, pp. 7-14.
  3. Alsina, A., T. Badia, G. Boleda, S. Bott, A. Gil, M. Quixal, and O. Valentí: 2002, 'CATCG: a general purpose parsing tool applied'. In: Proceedings of Third International Conference on Language Resources and Evaluation. Las Palmas, Spain, pp. 1130-1134.
  4. Anastasiou, D. and O. Culo: 2007, 'Using Topological Information for detecting idiomatic verb phrases in German'. In: Proceedings of the Conference on Practical Applications in Language and Computers (PALC). Lodz, Poland, pp. 49-58.
  5. Badia, T., G. Boleda, M. Melero, and A. Oliver: 2005, 'An n-gram approach to exploiting a monolingual corpus for machine translation'. In: MT Summit X Workshop on Example-Based Machine Translation. Pukhet, Thailand, pp. 1-7.
  6. Badia, T., M. Melero, and O. Valentín: 2008, 'Rapid Deployment of a New METIS Language Pair: Catalan-English'. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC08). Marrakech, Morocco, p. 96.
  7. Boutsis, S., P. Prokopidis, V. Giouli, and S. Piperidis: 2000, 'A Robust Parser for Unrestricted Greek Text'. In: Proceedings of the 2 nd International Conference on Language Resources and Evaluation (LREC). Athens, Greece, pp. 467-482.
  8. Brants, T.: 2000, 'TnT -a statistical part-of-speech tagger'. In: Proceedings of the 6th Applied Natural Language Processing Conference (ANLP). Seattle, Washington, USA, pp. 224-231.
  9. Brown, R. and R. Frederking: 1995, 'Applying statistical English language modelling to symbolic machine translation'. In: Proceedings of the 6th International Con- ference on Theoretical and Methodological Issues in Machine Translation (TMI). Leuven, Belgium, pp. 221-239.
  10. Carl, M.: 2007, 'METIS-II: The German to English MT System'. In: Proceedings of the 11th Machine Translation Summit. Copenhagen, Denmark, pp. 65-72.
  11. Carl, M.: 2008, 'Using Log-linear Models for Tuning Machine Translation Output'. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC08). Marrakech, Morocco, p. 49.
  12. Carl, M. and E. Rascu: 2006, 'A Dictionary Lookup Strategy for Translating Dis- continuous Phrases'. In: Proceedings of the European Association for Machine Translation. Oslo, Norway, pp. 49-58.
  13. Carl, M., P. Schmidt, and J. Schütz: 2005, 'Reversible Template-based Shake & Bake Generation'. In: Proceedings of the Example-Based Machine Translation Workshop held in conjunction with Machine Translation Summit X. Phuket, Thailand, pp. 17-26.
  14. Carpuat, M. and D. Wu: 2007, 'How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for Statistical Machine Translation'. In: Proceedings of the 11 th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-07). Skövde, Sweden, pp. 43-52.
  15. Doddington, G.: 2002, 'Automatic Evaluation of Machine Translation Quality using N-gram Co-occurrence Statistics'. In: Proceedings of the second Human Language Technologies Conference (HLT-02). San Diego, pp. 128-132.
  16. Dologlou, I., S. Markantonatou, G. Tambouratzis, O. Yannoutsou, A. Fourla, and N. Ioannou: 2003, 'Using Monolingual Corpora for Statistical Machine Translation'. In: Proceedings of EAMT/CLAW 2003. Dublin, Ireland, pp. 61-68.
  17. EAGLES: 1994, 'Guidelines, EAG-LWG-T4-2'. Technical report, ILC-CNR, Pisa, Italy.
  18. Engelbrecht, H. and T. Schultz: 2005, 'Rapid development of an Afrikaans English speech-to-speech translator'. In: International Workshop on Spoken Language Translation: Evaluation Campaign on Spoken Language Translation. Pittsburgh, PA, US., pp. 24-25.
  19. Germann, U., M. Jahr, K. Knight, D. Marcu, and K. Yamada: 2001, 'Fast Decoding and Optimal Decoding for Machine Translation'. In: Proceedings of the 39th ACL and 10th Conference of the European Chapter. Toulouse, France, pp. 228-235.
  20. Gispert, A. and J. B. Mariño: 2006, 'Catalan-English statistical machine transla- tion without parallel corpus: bridging through Spanish'. In: Fifth International Conference on Language Resources and Evaluation (LREC), 5th SALTMIL Workshop on Minority Languages: "Strategies for developing machine translation for minority languages". Genoa, Italy, pp. 65-68.
  21. Habash, N.: 2004, 'The use of a structural n-gram language model in generation- heavy hybrid machine translation'. In: Proceeding 3rd International Conference on Natural Language Generation (INLG '04), volume 3123 of LNAI, Springer, Germany. Brockenhurst, UK, pp. 61-69.
  22. Karlsson, F. e. a.: 1995, Constraint Grammar: A Language-Independent Formalism for Parsing Unrestricted Text. Berlin/New York: Mouton de Gruyter.
  23. Koehn, P.: 2004, 'Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Models'. In: Proceedings of AMTA, the Association for Machine Translation in the Americas. Washington, DC, USA., pp. 115-124.
  24. Koehn, P.: 2005, 'Europarl: A Parallel Corpus for Statistical Machine Translation'. In: Proceedings of MT Summit X. Pukhet, Thailand, pp. 79-86.
  25. Kuhn, H. W.: 1955, 'The Hungarian Method for the assignment problem'. Naval Research Logistics Quarterly 2, 88-97.
  26. Labropoulou, P., E. Mantzari, and M. Gavrilidou: 1996, 'Lexicon -Morphosyntac- tic Specifications: Language-Specific Instantiation (Greek)'. In: PP-PAROLE, MLAP report. Athens, Greece, pp. 63-386.
  27. Langkilde, I. and K. Knight: 1998, 'The Practical Value of n-grams in generation'. In: In Proceedings of the 9th International Natural Language Workshop (INLG '98). Niagara-on-the-Lake, Ontario, pp. 248-255.
  28. Lavie, A., E. Peterson, K. Probst, S. Wintner, and Y. Eytani: 2004, 'Rapid proto- typing of a transfer-based Hebrew-to-English machine translation system'. In: Proceedings of the Tenth Conference on Theoretical and Methodological Issues in Machine Translation. Baltimore, USA, pp. 1-10.
  29. Maas, H.-D.: 1996, 'MPRO -Ein System zur Analyse und Synthese deutscher Wörter'. In: R. Hausser (ed.): Linguistische Verifikation, Sprache und Infor- mation. Tübingen: Max Niemeyer Verlag.
  30. Majithia, H., P. Rennart, and E. Tzoukermann: 2005, 'Rapid ramp-up for statistical machine translation: minimal training for maximal coverage '. In: Proceedings of the Machine Translation Summit X. Phuket, Thailand, pp. 438-444.
  31. Markantonatou, S., S. Sofianopoulos, V. Spilioti, G. Tambouratzis, M. Vassiliou, and O. Yannoutsou: 2006, 'Using Patterns for Machine Translation (MT)'. In: Proceedings of the European Association for Machine Translation 2006. Oslo, Norway, pp. 239-246.
  32. Melero, M., A. Oliver, T. Badia, and T. Suñol: 2007, 'Dealing with Bilingual Di- vergences in MT using Target language N-gram Models'. In: Proceedings of the METIS-II Workshop: New Approaches to Machine Translation, CLIN 17 - Computational Linguistics in the Netherlands. Leuven, Belgium, pp. 19-26.
  33. METIS-II: 2006, 'Validation/Evaluation framework'. Public Report, D5.1, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/ Metis2_D5.1.pdf [25.Aug.2008].
  34. METIS-II: 2007, 'Validation & Fine-Tuning Results for the first Prototype'. Public Report, D5.2, European Commission, FP6-IST-003768, Brussels. http://www. ilsp.gr/metis2/files/Metis2_D5.2.pdf [25.Aug.2008].
  35. Müller, F. H.: 2004, 'Stylebook for the Tübingen Partially Parsed Corpus of Writ- ten German (T ÜPP-D/Z)'. http://www.sfb441.uni-tuebingen.de/a1/pub. html[25.Aug.2008].
  36. Munkres, J.: 1955, 'Algorithms for the Assignment and Transportation Problems'. Journal of the Society of Industrial and Applied Mathematics 5(1), 32-38.
  37. Och, F. J. and H. Ney: 2002, 'Discriminative Training and Maximum Entropy Models for Statistical Machine Translation'. In: Proceedings of the 40th annual ACL Conference. Philadelphia, PA, pp. 295-302.
  38. Papineni, K., S. Roukos, T. Ward, and W. Zhu: 2002, 'BLEU: a method for auto- matic evaluation of machine translation'. In: Proceedings of the 40th ACL. pp. 311-318.
  39. Pinkham, J. and M. Smets: 2002, 'Modular MT with a learned bilingual dictionary: rapid deployment of a new language pair'. In: Coling. Taipei, Taiwan, pp. 800- 806.
  40. Pytlik, B. and D. Yarowsky: 2006, 'Machine translation for languages lacking bitext via multilingual gloss transduction'. In: Proceedings of the 7th Conference of the Association for Machine Translation in the Americas, "Visions for the Future of Machine Translation". Cambridge, Massachusetts, USA, pp. 156-165.
  41. METIS-II.tex; 10/03/2010; 16:08; p.38
  42. Snover, M., B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul: 2006, 'A Study of Translation Edit Rate with Targeted Human Annotation'. In: Proceedings of Association for Machine Translation in the Americas (AMTA 2006). pp. 223- 231.
  43. Szopa, R.: 2007, 'LRBL. A Rule-Based Lemmatizer (with rules for Dutch)'. Technical report, Centre for Computational Linguistics, Leuven, Belgium.
  44. Tambouratzis, G., S. Sofianopoulos, V. Spilioti, M. Vassiliou, O. Yannoutsou, and M. S.: 2006, 'Pattern matching-based system for Machine Translation (MT)'. In: Proceedings of Advances in Artificial Intelligence: 4th Hellenic Conference on AI, SETN 2006. Heraklion, Crete, pp. 345-355.
  45. Van Eynde, F. .: 2004, 'Part of Speech Tagging en Lemmatisering van het Corpus Gesproken Nederlands'. Annotation protocol, Centrum voor Computer- linguïstiek, Leuven, Belgium.
  46. Vandeghinste, V.: 2005, 'Manual for ShaRPa 2.1'. User manual, Centre for Computational Linguistics, Leuven, Belgium.
  47. Vandeghinste, V.: 2008, 'A Hybrid Modular Machine Translation System'. Phd thesis, Netherlands Graduate School of Linguistics, Leuven, Belgium.
  48. Vandeghinste, V., P. Dirix, I. Schuurman, S. Markantonatou, S. Sofianopoulos, M. Vassiliou, O. Yannoutsou, T. Badia, M. Melero, G. Boleda, M. Carl, and P. Schmidt: 2008, 'Evaluation of a Machine Translation System for Low Resource Languages: METIS-II'. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC). Marrakech, Morocco, p. 96.
  49. Vossen, P., L. Bloksma, and B. P.: 1999, 'The Dutch Wordnet'. Technical report, University of Amsterdam, Amsterdam, NL.
  50. Whitelock, P.: 1991, 'Shake-and-Bake Translation'. Unpublished Draft.
  51. Whitelock, P.: 1992, 'Shake-and-Bake Translation'. In: Proceedings of the COL- ING92. Nantes, France, pp. 784-791.
  52. Zwarts, S. and M. Dras: 2007, 'Syntax-based Word Reordering in Phrase-Based Statistical Machine Translation; Why Does it Work?'. In: Proceedings of MT Summit XI. Copenhagen. Denmark, pp. 559-566.