Academia.eduAcademia.edu

Outline

Variants and Homographs : Eternal Problem of Dictionary Makers ⋆

2008

Abstract

We discuss two types of asymmetry between wordforms and their (morphological) characteristics, namely (morphological) variants and homographs. We introduce a concept of multiple lemma that allows for unique identification of wordform variants as well as ‘morphologicallybased’ identification of homographic lexemes. The deeper insight into these concepts allows further refining of morphological dictionaries and subsequently better performance of any NLP tasks. We demonstrate our approach on the morphological dictionary of Czech.

References (8)

  1. ISO/TC 37/SC 4: Language Resources Management -Lexical Markup Framework (LMF). http://www.lexicalmarkupframework.org/ (2007) Rev. 14, date 2007-06-03.
  2. Matthews, H.: The Concise Oxford Dictionary of Linguistics. Oxford University Press, Oxford (1997)
  3. Cruse, D.A.: Lexical Semantics. Cambridge University Press, Cambridge (1986)
  4. Filipec, J.: Lexicology and Lexicography: Development and State of the Research. In Luelsdorff, P.A., ed.: The Prague School of Structural and Functional Linguistics, Amsterdam-Philadelphia, John Benjamins (1994) 163-183
  5. Žabokrtský, Z.: Valency Lexicon of Czech Verbs. PhD thesis, Charles University, Prague (2005)
  6. Hlaváčová, J.: Pravopisné varianty a morfologická anotace korpusů. In Štícha, F., ed.: Proceedings of 2nd International Conference Grammar and Corpora 2007. (2008) In press.
  7. Hajič, J.: Disambiguation of Rich Inflection (Computational Morphology of Czech). Karolinum, Charles Univeristy Press, Prague (2004)
  8. Panevová, J.: Formy a funkce ve stavbě české věty. Academia, Praha (1980)