Academia.eduAcademia.edu

Outline

English-Latvian Toponym Translation

Abstract

Toponyms in general are studied by toponymy, they represent names of places comprising the following types: hydronyms (names of bodies of water: bays, or other building); cosmonyms or astronyms (names of stars, constellations or other heavenly bodies). The paper aims to research a complicated task of machine translation (MT) and cross-language information retrieval (CLIR) -automated translation of toponyms. Most of toponym translation approaches are data-driven (see, e.g. Meng et al., 2001; Al-Onaizan and Knight, 2002; Sproat et al., 2006; Alegria et al., 2006; Wentland et al., 2008) since they deal with widely used languages which have enough linguistic resources for development. Taking into account an under-resourced status of the Latvian language with few available corpus resources, especially parallel bilingual corpora, a rule-based approach is proposed for the English-Latvian toponym translation. There are several commonly used translation strategies for toponyms (Babych and Hartley, 2004) : transference strategy (i.e., do-not-translate), transliteration strategy (i.e., phonetic or spelling rendering), translation strategy (i.e., translation itself) and combined strategy. Transference strategy with a do-not-translate list is often used for translation of toponyms which do not need any rendering at all and are often left not translated , e.g. organization names (Babych and Hartley, 2003) or names of hotels in our system. The most common transliteration techniques are phoneme-based and grapheme-based (Zhang et al., 2004) . The phoneme-based approach (Knight and Graehl, 1998; Meng et al., 2001; Oh and Choi, 2002; Lee and Chang, 2003) implies conversion of a source language word into a target language word via its phonemic representation, i.e., grapheme-phoneme-grapheme conversion. The grapheme-based technique converts a source language word into a target language word without any phonemic representation (grapheme-grapheme conversion) (Stalls and Knight, 1998; Li et al., 2004) . Although Geoffrey Leech (1981) accepts a special status of toponyms as proper names without a conceptual meaning since any componential analysis cannot be performed for them, we should bear in mind and admit the fact that many toponyms are at least meaningful etymologically, e.g Cambridge -bridge over the river Cam (Leidner, 2007) . Toponyms are also ambiguous. Leidner (2007) describes three types of toponymical ambiguity: morpho-syntactic ambiguity: a word itself may be a toponym or may be a non-toponym, e.g. Liepa as a populated place in Latvia versus liepa (lime-tree) as a common noun; referential ambiguity: a toponym may refer to more than one place of the same type, e.g. Riga as a populated place and the capital of Latvia and Riga as a populated place in the USA, state Michigan; feature type ambiguity: a toponym may refer to more than one place of a different type, e.g. Ogre as a populated place and a river in Latvia. Another type of toponymical ambiguity is eponymical ambiguity when places are named after people or deities, e.g., Vancouver after George Vancouver. Sometimes the same place is known by different names -endonyms (names of places used by inhabitants, self-assigned names) and exonyms (names of places used by other groups, not locals), e.g. Firenze for its inhabitants and Florence for English.