Abstract
This paper describes development and design of an ontology of linguistic annotations, primarily word classes and morphosyntactic features, based on existing standardization approaches (e.g. EAGLES), a set of annotation schemes (e.g. for German, STTS and morphological annotations), and existing terminological resources (e.g. GOLD). The ontology is intended to be a platform for terminological integration, integrated representation and ontology-based search across existing linguistic resources with terminologically heterogeneous annotations. Further, it can be applied to augment the semantic analysis of a given text with an ontological interpretation of its morphosyntactic analysis.
References (19)
- Brants, S. and Hansen, S. (2002). Developments in the TIGER annotation scheme and their realization in the corpus. In Proc. 3rd Conference on Language Resources and Evaluation (LREC-02), Las Palmas de Gran Canaria, Spain.
- Broschart, J. (1997). Why Tongan does it differently: Categorial distinctions in a language without nouns and verbs. Linguistic Typology, 1-2:123-166.
- Chiarcos, C. (2006). An ontology for heterogeneous data collections. In Proc. Corpus Linguistics 2006, pages 373-380, St.-Petersburg. St.-Petersburg University Press.
- Cimiano, P. and Reyle, U. (2003). Ontology-based semantic construction, underspeci- fication and disambiguation. In Proc. Prospects and Advances in the Syntax-Semantic Interface Workshop.
- de Cea, G. A., Gómez-Pérez, A., Álvarez de Mon, I., and Pareja-Lora, A. (2004). OntoTag's linguistic ontologies. In Proc. Int'l Conference on Information Technology, Coding and Computing (ITCC'04), pages 124-128, Las Vegas, Nevada.
- Farrar, S. and Langendoen, D. T. (2003). A linguistic ontology for the semantic web. GLOT International, 7(3):97-100.
- Hughes, J., Souter, C., and Atwell, E. (1995). Automatic extraction of tag set mappings from parallel annotated corpora. In From Text to Tags: Issues in Multilingual Language Analysis, Proc. ACL-SIGDAT Workshop, pages 10-17.
- ICOM (2006). ICOM code of ethics for museums. In Hoffman, B. T., editor, Art and Cultural Heritage. Law, Policy and Practice. Cambridge University Press. Chiarcos
- Ide, N., Romary, L., and de la Clergeri, E. (2005). International standard for a linguistic annotation framework. In Proc. HLT-NAACL'03 Workshop Software Engineering and Architecture of Language Technology.
- König, E., Bakker, D., Dahl, e., Haspelmath, M., Koptjevskaja-Tamm, M., Lehmann, C., and Siewierska, A. (1993). EUROTYP Guidelines. Technical report, European Science Foundation Programme in Language Typology.
- Leech, G. and Wilson, A. (1996). EAGLES recommendations for the morphosyntac- tic annotation of corpora. Technical report, Expert Advisory Group on Language Engineering Standards.
- Lezius, W., Rapp, R., and Wettler, M. (1998). A freely available morphological analyzer, disambiguator, and context sensitive lemmatizer for German. In Proc. COLING-ACL 1998, pages 743-747.
- Monachini, M., Soria, C., and Ulivieri, M. (2005). Evaluation of existing standards for NLP lexica. draft 1.1. Technical report, LIRICS (Linguistic Infrastructure for Interoperable Resource and Systems).
- Rehm, G., Eckart, R., and Chiarcos, C. (2007). An OWL-and XQuery-based mechanism for the retrieval of linguistic patterns from XML-corpora. In Proc. RANLP 2007: Recent Advances in Natural Language Processing. Borovets, Bulgaria.
- Sampson, G. (1995). English for the Computer. Clarendon Press, Oxford.
- Schiller, A., Teufel, S., and Thielen, C. (1995). Guidelines fur das Tagging deutscher Textkorpora mit STTS. Technical report, University of Stuttgart and Universitat of Tübingen.
- Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Interna- tional Conference on New Methods in Language Processing, pages 44-49, Manchester,UK.
- Schneider, R. (2007). A database-driven ontology for German grammar. In Rehm, G., Witt, A., and Lemnitzer, L., editors, Data Structures for Linguistic Resources and Applications, pages 305-314, Tübingen. Narr.
- Stede, M. (2004). The Potsdam Commentary Corpus. In Proc. ACL-04 Workshop on Discourse Annotation, pages 96-102, Barcelona.