The Lemmatisation Task at the EVALITA 2011 Evaluation Campaign
2013, Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-642-35828-9_25Abstract
This paper reports on the EVALITA 2011 Lemmatisation task, an initiative for the evaluation of automatic lemmatisation tools specifically developed for the Italian language. Despite lemmatisation is often considered a subproduct of a PoS-tagging procedure that does not cause any particular problem, there are a lot of specific cases, certainly in Italian and in some other highly inflected languages, in which, given the same lexical class, we face a lemma ambiguity. A relevant number of scholars and teams participated experimenting their systems on the data provided by the task organisers. The results are very interesting and the overall performances of the participating systems were very high, exceeding, on interesting cases, 99% of lemmatisation accuracy.
References (14)
- Agic, Z., Tadic, M., Dovedan, Z.: Evaluating Full Lemmatization of Croatian Texts. Recent Ad- vances in Intelligent Information Systems, pp. 175-184. Academic Publishing House (2009)
- Airio, E.: Word normalization and decompounding in mono-and bilingual. IR Information Retrieval 9, 249-271 (2006)
- Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Transactions on Speech and Language Processing 4(1), 3:1-3:34 (2007)
- De Mauro, T.: Il dizionario della lingua italiana, Paravia (2000)
- Hammarström, H., Borin, L.: Unsupervised Learning of Morphology. Computational Lin- guistics 37(2), 309-350 (2011)
- Hardie, A., Lohani Yogendra, R.R., Yadava, P.: Extending corpus annotation of Nepali: ad- vances in tokenisation and lemmatisation. Himalayan Linguistics 10(1), 151-165 (2011)
- Ingason, A.K., Helgadóttir, S., Loftsson, H., Rögnvaldsson, E.: A Mixed Method Lemma- tization Algorithm Using a Hierarchy of Linguistic Identities (HOLI). In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 205-216. Springer, Heidelberg (2008)
- Mendes, A., Amaro, R., Bacelar do Nascimento, M.F.: Reusing Available Resources for Tag- ging a Spoken Portuguese Corpus. In: Branco, A., Mendes, A., Ribeiro, R. (eds.) Language Technology for Portuguese: Shallow Processing Tools and Resources, pp. 25-28. Lisbon, Edicoes Colibri (2003)
- Monachini, M.: ELM-IT: EAGLES Specification for Italian morphosintax Lexicon Specifi- cation and Classification Guidelines. EAGLES Document EAG CLWG ELM IT/F (1996)
- Pirkola, A.: Morphological typology of languages for IR. Journal of Documentation 57(3), 330-348 (2001)
- Plisson, J., Lavrač, N., Mladenić, D., Erjavec, T.: Ripple Down Rule Learning for Automated Word Lemmatisation. AI Communications 21, 15-26 (2008)
- Tamburini, F.: EVALITA 2007: the Part-of-Speech Tagging Task. Intelligenza Artifi- ciale IV(2), 4-7 (2007)
- Van Eynde, F., Zavrel, J., Daelemans, W.: Lemmatisation and morphosyntactic annotation for the spoken Dutch corpus. In: Proceedings of CLIN 1999, pp. 53-62. Utrecht Institute of Linguistics OTS, Utrecht (1999)
- Zanchetta, E., Baroni, M.: Morph-it! A free corpus-based morphological resource for the Ital- ian language. In: Proceedings of Corpus Linguistics 2005. University of Birmingham (2005)