Pattern-based English-Latvian Toponym Translation

Inguna Skadina

Outline

Pattern-based English-Latvian Toponym Translation

Abstract

Due to their linguistic and extra-linguistic nature toponyms deserve a special treatment when they are translated. The paper deals with issues related to automated translation of toponyms from English into Latvian. Translation process allows us to translate not only toponyms from a dictionary, but out-of-vocabulary toponyms as well. Translation of out-of-vocabulary toponyms is divided into three steps: source string normalization, translation, and target string normalization. Translation step implies application of translation strategies and linguistic toponym translation patterns. 10,000 UK-related toponyms from Geonames were used as a development set. The developed methods have been evaluated on a test set: the accuracy of translation is 67% for the whole test set, 58% for oneword toponymic units, and 81% for multiword toponyms.

Figures (2)

Multi-word LTTPs involve three translation strategies. The first translation strategy S; is based on transliteration rules. Translation strategy S, combines the translation strategy S; with the inser- tion of a nomenclature word, e.g., Bebington (as a railroad station) — Bebingtonas stacija. If a nomen- clature word is included in a source toponymic unit, as it is in the pattern S;, it is either translated (Newton Point - Nitona zemesrags, Gog Magog Hills - Gogmagogu kalni) or transliterated (Green Isle — Grinaila, North East Coast — Nortistkosta) in the target language. Target string normalization modifies a toponymic unit according to the Latvian grammar and ortho- graphy rules, e.g. all populated places are feminine gender (see P2): Newcastle — Nikdsla which is indicated by the ending —a (feminine, singular no- minative).

tionaries. 330 English toponymic units of different types with Latvian translation equivalents were manually extracted from dictionaries and processed with our OOV toponym translation module. We set the following evaluation scores:

References (19)

Antonija Ahero. 2006. English Proper Name Rendering into the Latvian Language (Angļu Īpašvārdu Atveide Latviešu Valodā). Zinātne, Rīga.
Iñaki Alegria, Nerea Ezeiza, Izaskun Fernandez. 2006. Named entities translation based on comparable cor- pora. Proceedings of the 11 th Conference of the Eu- ropean Chapter of the Association for Computational Linguistics, Workshop on Multi-word expressions in a Multilingual Context, Italy. Pp.1-8.
Yaser Al-Onaizan and Kevin Knight. 2002. Translating named entities using monolingual and bilingual re- sources. Proceedings of the 40 th Meeting of the Asso- ciation for Computational Linguistics, USA. Pp.400- 408.
Bogdan Babych and Anthony Hartley. 2003. Improving Machine Translation Quality with Automatic Named Entity Recognition. Proceedings of the 7 th European Association for Machine Translation Workshop Im- proving machine translation through other language Technology Tools, Hungary. Pp.1-8.
Bogdan Babych and Anthony Hartley. 2004. Selecting Translation Strategies in MT using Automatic Named Entity Recognition. Proceedings of the 9 th European Association for Machine Translation Workshop Broadening horizons of machine transla- tion and its applications, Malta. Pp.18-25.
Gilberto Castañeda-Hernández. 2004. Navigating through Treacherous Waters: The Translation of Geographical Names. Translation Journal, 8(2): [electronic resource]: http://accurapid.com/journal/28names.htm#1
Sarvnaz Karimi, Falk Scholer, and Andrew Turpin. 2007. Collapsed consonant and vowel models: new approaches for English-Persian transliteration and back-transliteration. Proceedings of the 45 th Annual Meeting of the Association for Computational Lin- guistics, Czech Republic. Pp.648-655.
Kevin Knight and Jonathan Graehl. 1998. Machine Transliteration. Computational Linguistics, 24(4):599-612.
Chun-Jen Lee and Jason S. Chang. 2003. Acquisition of English-Chinese Transliteration Word Pairs from Pa- rallel-Aligned Texts using a Statistical Machine Translation Model. Proceedings of Human Language Technologies -The North American Chapter of the Association for Computational Linguistics Workshop: Building and Using parallel Texts Data Driven Ma- chine Translation and Beyond, Canada. Pp.96-103.
Geoffrey Leech. 1981. Semantics. The Study of Meaning. 2 nd edition. Penguin, London, England, UK.
Jochen L. Leidner. 2007. Toponym Resolution in Text: Annotation, Evaluation and Applications of Spatial Grounding of Place Names. PhD thesis. Institute for Communicating and Collaborative Systems School of Informatics, University of Edinburgh.
Haizhou Li, Min Zhang, and Jian Su. 2004. A joint source-channel model for machine translitera- tion. Proceedings of the 42 nd Annual Meeting on as- sociation for Computational Linguistics. Spain. Pp.159-166.
Katja Markert and Malvina Nissim. 2002. Towards a corpus annotated for metonymies: the case of loca- tion names. Proceedings of the 3 rd International Con- ference on Language Resources and Evaluation, France. Pp.1385-1392.
Helen M. Meng, Wai-Kit Lo, Berlin Chen, and Karen Tang. 2001. Generate Phonetic Cognates to Handle Named Entities in English-Chinese cross-language spoken document retrieval. Proceedings of Institute of Electrical and Electronics Engineers Automatic Speech Recognition and Understanding Workshop, Italy.
Jong-Hoon Oh and Key-Sun Choi. 2002. An English- Korean Transliteration Model Using Pronunciation and Contextual Rules. Proceedings of the 19 th Inter- national Conference on Computational Linguistics, Taiwan, 1:1-7.
Richard Sproat, Tao Tao, and Cheng-Xiang Zhai. 2006. Named entity transliteration with comparable corpo- ra. Proceedings of the 44 th Annual meeting of the As- sociation for Computational Linguistics, Australia. Pp.73-80.
Bonnie Glover Stalls and Kevin Knight. 1998. Translat- ing Names and Technical Terms in Arabic Text. Pro- ceedings of the Coling / Association for Computa- tional Linguistics Workshop on Computational Ap- proaches to Semitic Languages, Canada. Pp.365-266.
Wolodja Wentland, Johannes Knopp, Carina Silberer, and Matthias Hartung. 2008. Building a Multilingual Lexical Resource for Named Entity Disambiguation, Translation and Transliteration. Proceedings of the 6 th Language Resources and Evaluation Conference, Morocco.
Min Zhang, Haizhou Li, and Jian Su. 2004. Direct Or- thographical Mapping for Machine Transliteration. Proceedings of the 20 th International Conference on Computational Linguistics, Switzerland.

Pattern-based English-Latvian Toponym Translation

Sign up for access to the world's latest research

Abstract

Related papers

References (19)

Related papers