Multilingual language resources and interoperability
2009
Abstract
Abstract This article introduces the topic of “Multilingual language resources and interoperability”. We start with a taxonomy and parameters for classifying language resources. Later we provide examples and issues of interoperatability, and resource architectures to solve such issues. Finally we discuss aspects of linguistic formalisms and interoperability.
References (34)
- Baader, F., D. Calvanese, D. L. /McGuinness, D. Nardi, and P. F. Patel-Schneider (eds.): 2003, The Description Logic Handbook: Theory, Implementation and Applications. CUP.
- Bird, S. and M. Liberman: 2001, 'A Formal Framework for Linguistic Annotation'. Speech Communication 33(1,2), 23-60.
- Boitet, C., M. Mangeot, and G. Sérasset: 2002, 'The PAPILLON project: coopera- tively building a multilingual lexical data-base to derive open source dictionaries & lexicons'. In: NLPXML '02: Proceedings of the 2nd workshop on NLP and XML. Morristown, NJ, USA, Association for Computational Linguistics.
- Burnard, L. and S. Bauman (eds.): 2007, TEI P5: Guidelines for Electronic Text Encoding and Interchange. Text Encoding Initiative Consortium.
- Calzolari, N., A. Zampolli, and A. Lenci: 2002, 'Towards a Standard for a Mul- tilingual Lexical Entry: The EAGLES/ISLE Initiative'. In: CICLing '02: Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing. London, UK, pp. 264-279, Springer-Verlag.
- Carletta, J., S. Evert, U. Heid, J. Kilgour, J. Robertson, and H. Voormann: 2003, 'The NITE XML Toolkit: Flexible annotation for multi-modal language data'. Behavior Research Methods, Instruments, and Computers 35(3), 353-363.
- Carpenter, B.: 1992, The Logic of Typed Feature Structures: With Applications to Unification Grammars, Logic Programs and Constraint Resolution, No. 24 in Cambridge Tracts in Theoretical Computer Science. Cambridge University Press.
- Cunningham, H.: 2002, 'GATE, a General Architecture for Text Engineering'. Computers and the Humanities 36, 223-254.
- Emerson, T. and J. O'Neil: 2006, 'Large Corpus Construction for Chinese Lexicon Development'. In: Proceedings of the 29th Unicode Conference. San Francisco, USA. Evans, R. and G. Gazdar: 1996, 'DATR: A language for lexical knowledge representation'. Computational Linguistics 22(22), 167-216.
- Farrar, S. and D. T. Langendoen: 2003, 'A linguistic ontology for the Semantic Web'. GLOT International 7(3), 97-100.
- Ferrucci, D. and A. Lally: 2004, 'UIMA: an architectural approach to unstructured information processing in the corporate research environment'. Nat. Lang. Eng. 10(3-4), 327-348.
- Francopoulo, G., N. Bel, M. George, N. Calzolari, M. Monachini, M. Pet, and C. Soria: 2006a, 'Lexical Markup Framework (LMF) for NLP Multilingual Re- sources'. In: Proceedings of the Workshop on Multilingual Language Resources and Interoperability. Sydney, Australia, pp. 1-8, Association for Computational Linguistics. lre-intro.tex; 28/01/2009; 14:31; p.14
- Francopoulo, G., M. George, N. Calzolari, M. Monachini, N. Bel, M. Pet, and C. Soria: 2006b, 'LMF for multilingual, specialized lexicons'. In: E. Hinrichs, N. Ide, M. Palmer, and J. Pustejovsky (eds.): Proceedings of the LREC 2006 Satellite Workshop on Merging and Layering Linguistic Information. Genoa, Italy.
- Görz, G.: in prep., 'Representing Computational Dictionaries in AI-Oriented Knowl- edge Representation Formalisms'. In: Dictionaries. An International Handbook of Lexicography -Supplementary volume: New developments in lexicography, with a special focus on computational lexicography, HSK -Handbücher zur Sprach- und Kommunikationswissenschaft. Berlin: W. de Gruyter, pp. 10-19. to appear in 2009.
- Helbig, H.: 2001, Die semantische Struktur natürlicher Sprache: Wissensrepräsentation mit MultiNet. Berlin: Springer.
- Ide, N. and K. Suderman: 2007, 'GrAF: A Graph-based Format for Linguistic Anno- tations'. In: Proceedings of the ACL Workshop on Linguistic Annotation. Prague, Czech Republic, pp. 1-8.
- ISO 24610-1:2006: 2006, 'Language Resource Management -Feature Structures -Part 1: Feature Structure Representation'. Technical report, International Organization for Standardization.
- Kaplan, R. M. and J. Bresnan: 1982, 'Lexical-Functional Grammar: A Formal System for Grammatical Representation'. In: J. Bresnan (ed.): The Mental Rep- resentation of Grammatical Relations. Cambridge, Massachusetts: MIT Press, pp. 173-281.
- Leech, G. and A. Wilson: 1996, 'EAGLES. Recommendations for the Morphosyn- tactic Annotation of Corpora'. Technical report, Expert Advisory Group on Language Engineering Standards. EAGLES Document EAG-TCWG-MAC/R.
- Lieske, C. and F. Sasaki: 2007, 'Internationalization Tag Set (ITS) 1.0. W3C Recommendation'. Technical report, World Wide Web Consortium.
- Lyding, V., E. Chiocchetti, G. Sérasset, and F. Brunet-Manquat: 2006, 'The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminol- ogy Consolidated'. In: A. Witt, G. Sérasset, S. Armstrong, J. Breen, U. Heid, and F. Sasaki (eds.): Proceedings of the Workshop on Multilingual Language Resources and Interoperability. Sydney, Australia, pp. 25-31, Association for Computational Linguistics.
- Marc Kemps-Snijders, Menzo Windhouwer, P. W. and S. E. Wright: 2008, 'ISOcat: Corralling Data Categories in the Wild'. In: European Language Resources Asso- ciation (ELRA) (ed.): Proceedings of the Sixth International Language Resources and Evaluation (LREC'08). Marrakech, Morocco.
- McGuinness, D. L. and F. v. Harmelen: 2003, 'OWL Web Ontology Language Overview'. Technical report, W3C. http://www.w3.org/TR/owl-features/.
- Pollard, C. and I. A. Sag: 1994, Head-Driven Phrase Structure Grammar. Chicago, Illinois: The University of Chicago Press.
- Resnik, P., M. B. Olsen, and M. Diab: 1999, 'The Bible as a Parallel Corpus: An- notating the 'Book of 2000 Tongues''. Computers and the Humanities 33(1-2), 129-153.
- Richardson, L. and S. Ruby: 2007, RESTful Web Services. O'Reilly.
- Sasaki, F., A. Witt, and D. Metzing: 2003, 'Declarations of Relations, Differences and Transformations between Theory-specific Treebanks: A New Methodology'. In: J. Nivre (ed.): The Second Workshop on Treebanks and Linguistic Theories (TLT 2003). Vaxjö University, Sweden.
- Schäfer, U.: 2006, 'Middleware for Creating and Combining Multi-dimensional NLP markup'. In: Proceedings of the EACL-2006 Workshop on Multi-dimensional Markup in Natural Language Processing. Trento, Italy.
- Tognini-Bonelli, E.: 2001, Corpus Linguistics at Work, Vol. 6 of Studies in Corpus Linguistics. Amsterdam: Benjamins.
- Trippel, T.: 2006, The Lexicon Graph Model: A Generic Model for Multimodal Lexicon Development. Saarbrücken, Germany: AQ-Verlag.
- Váradi, T., S. Krauwer, P. Wittenburg, M. Wynne, and K. Koskenniemi: 2008, 'CLARIN: Common Language Resources and Technology Infrastructure'. In: European Language Resources Association (ELRA) (ed.): Proceedings of the Sixth International Language Resources and Evaluation (LREC'08).
- Witt, A., G. Rehm, E. Hinrichs, T. Lehmberg, and J. Stegmann: 2009, 'SusTEInabil- ity of Linguistic Resources through Feature Structures'. Literary and Linguistic Computing. In print.
- Witt, A., G. Sérasset, S. Armstrong, J. Breen, U. Heid, and F. Sasaki (eds.): 2006, Proceedings of the Workshop on Multilingual Language Resources and Interoperability. Sydney, Australia: Association for Computational Linguistics.
- Wörner, K., A. Witt, G. Rehm, and S. Dipper: 2006, 'Modelling Linguistic Data Structures'. In: B. T. Usdin (ed.): Proceedings of Extreme Markup Languages 2006. Montréal, Canada.