METIS-II: Low-Resource MT for German to English
Journal for Language Technology and Computational Linguistics
https://doi.org/10.21248/JLCL.24.2009.122Abstract
METIS-II was a EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use 'basic' linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their 'home' languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It emphazises on the German implementation. 2 Introduction Starting in October 2004, METIS-II was the continuation of METIS-I (IST-2001-32775) Dologlou et al. (2003). Like METIS-I, METIS-II aims at translating free text input by taking advantage of a combination of statistical, pattern-matching and rule-based methods. The METIS-II project has four partners, each translating from their 'home' languages Greek, Dutch, German, and Spanish into English. The following goals and premises were defined for the project: 1. use 'basic' NLP tools and resources, 2. use bilingual handmade dictionaries, 3. use a monolingual target-language corpus, 4. use translation units within the sentence boundary, 5. allow different tag sets for SL and TL possible, Crucially, parallel corpora are not required, and their usage was excluded within METIS-II. The rationale behind this was to develop prototypes of MT systems which would be suitable to translate 'small languages', i.e. language pairs for which parallel texts are difficult to come by. A basic set of NLP tools is nonetheless required for these languages, albeit very basic. The availability of the monolingual target language corpus,
References (22)
- Anastasiou, D. and Culo, O. (2007). Using Topological Information for detecting idiomatic verb phrases in German. In Proceedings of the Conference on Practical Applications in Language and Computers (PALC), pages 49-58, Lodz, Poland.
- Brown, R. and Frederking, R. (1995). Applying statistical English language modelling to sym- bolic machine translation. In Proceedings of the 6th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI), pages 221-239, Leuven, Belgium.
- Carl, M. (2007). METIS-II: The German to English MT System. In Proceedings of the 11th Machine Translation Summit, Copenhagen, Denmark.
- Carl, M. and Rascu, E. (2006). A dictionary lookup strategy for translating discontinuous phrases. In Proceedings of the European Association for Machine Translation, pages 49-58, Oslo, Norway.
- Carl Carl, M., Schmidt, P., and Schütz, J. (2005). Reversibl1e Template-based Shake & Bake Generation. In Proceedings of the Example-Based Machine Translation Workshop held in conjunction with the 10 th Machine Translation Summit, pages 17-26, Phuket, Thailand.
- Carpuat, M. and Wu, D. (2007). How Phrase Sense Disambiguation outperforms Word Sense Disambiguation for Statistical Machine Translation. In Proceedings of the 11 th International Confer- ence on Theoretical and Methodological Issues in Machine Translation (TMI-07), pages 43-52, Skövde, Sweden.
- Doddington, G. (2002). Automatic Evaluation of Machine Translation Quality using N-gram Co- occurrence Statistics. In Proceedings of the second Human Language Technologies Conference (HLT-02), pages 128-132, San Diego.
- Dologlou, I., Markantonatou, S., Tambouratzis, G., Yannoutsou, O., Fourla, A., and Ioannou, N. (2003). Using monolingual corpora for statistical machine translation. In Proceedings of EAMT/CLAW 2003, pages 61-68, Dublin, Ireland.
- Germann, U., Jahr, M., Knight, K., Marcu, D., and Yamada, K. (2001). Fast Decoding and Optimal Decoding for Machine Translation. In Proceedings of the 39th ACL and 10th Conference of the European Chapter, pages 228-235, Toulouse, France.
- Habash, N. (2004). The use of a structural n-gram language model in generation-heavy hybrid machine translation. In Proceeding 3rd International Conference on Natural Language Generation (INLG '04), volume 3123 of LNAI, Springer, pages 61-69.
- Koehn, P. (2004). Pharaoh: a Beam Search Decoder for Phrase-Based Statistical Machine Translation Models. In Proceedings of AMTA, the Association for Machine Translation in the Americas, pages 115-124, Washington, DC, USA.
- Langkilde, I. and Knight, K. (1998). The Practical Value of n-grams in generation. In In Proceedings of the 9th International Natural Language Workshop (INLG '98), Niagara-on-the-Lake, Ontario.
- Maas, H.-D. (1996). MPRO -Ein System zur Analyse und Synthese deutscher Wörter. In Hausser, R., editor, Linguistische Verifikation, Sprache und Information. Max Niemeyer Verlag, Tübingen.
- METIS-II (2006). Validation/Evaluation framework. Public Report, D5.1, European Com- mission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_D5.1.pdf [25.Aug.2008].
- METIS-II (2007). Validation & Fine-Tuning Results for the first Prototype. Public Report, D5.2, European Commission, FP6-IST-003768, Brussels. http://www.ilsp.gr/metis2/files/Metis2_ D5.2.pdf[25.Aug.2008].
- Müller, F. H. (2004). Stylebook for the Tübingen Partially Parsed Corpus of Written German (TÜPP-D/Z). http://www.sfb441.uni-tuebingen.de/a1/pub.html[25.Aug.2008].
- Och, F. J. and Ney, H. (2002). Discriminative Training and Maximum Entropy Models for Statistical Machine Translation. In Proceedings of the 40th annual ACL Conference, pages 295-302, Philadelphia, PA. Papineni, K., Roukos, S., Ward, T., and Zhu, W. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th ACL, pages 311-318.
- Snover, M., Dorr, B., Schwartz, R., Micciulla, L., and Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of Association for Machine Translation in the Americas (AMTA 2006), pages 223-231.
- Vandeghinste, V. (2008). A Hybrid Modular Machine Translation System. Phd thesis, Netherlands Graduate School of Linguistics.
- Vandeghinste, V., Dirix, P., Schuurman, I., Markantonatou, S., Sofianopoulos, S., Vassiliou, M., Yannoutsou, O., Badia, T., Melero, M., Boleda, G., Carl, M., and Schmidt, P. (2008). Evaluation of a Machine Translation System for Low Resource Languages: METIS-II. In Proceedings of the Sixth International Language Resources and Evaluation (LREC), page 96, Marrakech, Morocco.
- Whitelock, P. (1991). Shake-and-Bake Translation. Unpublished Draft.
- Whitelock, P. (1992). Shake-and-Bake Translation. In Proceedings of the COLING92.