Web-based lemmatisation of Named Entities
2008, Text, Speech and …
https://doi.org/10.1007/978-3-540-87391-4_9Abstract
Identifying the lemma of a Named Entity is important for many Natural Language Processing applications like Information Retrieval. Here we introduce a novel approach for Named Entity lemmatisation which utilises the occurrence frequencies of each possible lemma. We constructed four corpora in English and Hungarian and trained machine learning methods using them to obtain simple decision rules based on the web frequencies of the lemmas. In experiments our web-based heuristic achieved an average accuracy of nearly 91%.
References (11)
- Farkas, R., Simon, E., Szarvas, Gy., Varga, D.: Gyder: Maxent metonymy resolution. In: Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), Prague, Czech Republic, pp. 161-164. Association for Computational Linguistics (2007)
- Melĉuk, I.: Modèle de la déclinaison hongroise. In: Cours de morphologie générale (théorique et descriptive), Montréal, Les Presses de l'Université de Montréal, CNRS (edn). vol. 5, pp. 191-261 (2000)
- Halácsy, P., Trón, V.: Benefits of resource-based stemming in Hungarian information re- trieval. In: Peters, C., Clough, P., Gey, F.C., Karlgren, J., Magnini, B., Oard, D.W., de Rijke, M., Stempfhuber, M. (eds.) CLEF 2006. LNCS, vol. 4730, pp. 99-106. Springer, Heidelberg (2007)
- Piskorski, J., Sydow, M., Kupść, A.: Lemmatization of polish person names. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing, Prague, Czech Republic, pp. 27-34. Association for Computational Linguistics (2007)
- Erjavec, T., Dzeroski, S.: Machine learning of morphosyntactic structure: Lemmatizing un- known Slovene words. Applied Artificial Intelligence 18, 17-41 (2004)
- Bunescu, R.C., Pasca, M.: Using encyclopedic knowledge for named entity disambiguation. In: EACL (2006)
- Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91-134 (2005)
- Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann, San Francisco (2005)
- Aha, D.W., Kibler, D., Albert, M.K.: Instance-based learning algorithms. Mach. Learn. 6, 37-66 (1991)
- Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)
- Berger, A.L., Pietra, S.D., Pietra, V.J.D.: A maximum entropy approach to natural language processing. Computational Linguistics 22, 39-71 (1996)