In this paper, we report on the encoding of the Portuguese Academy Dictionary using TEI Lex0. We demonstrate how we applied this new baseline format for lexical data to mark up ‘special entries’ in the dictionary: part-of-speech homonyms...
moreIn this paper, we report on the encoding of the Portuguese Academy Dictionary using TEI Lex0. We demonstrate how we applied this new baseline format for lexical data to mark up ‘special entries’ in the dictionary: part-of-speech homonyms (capital1, capital2, capital3), etymological homonyms (cota1, cota2), homographs (lobo1 /ó/, lobo2 /ô/), spelling variants (ouro, oiro), trademarks (donut), entries that have a different meaning in the plural (antepassados), and lexical variants (missanga, miçanga). Even though TEI Lex-0 reduces the number of TEI elements that can be used to describe entry-like objects from five (<entryFree>, <entry>, <superEntry>, <hom> and <re>) to only one (<entry>), our work shows that TEI Lex0 is fully capable of representing the complexities of the entry structure of the Portuguese
Academy Dictionary. Furthermore, we argue that this simplified array of elements can lead to more coherent and more legible encoding without sacrificing its semantic expressivity. In addition to justifying our concrete encoding choices, we will describe the process of converting our data from TEI to TEI Lex-0 and the documentation of the differences between our original TEI encoding and the TEI Lex-0 version. As of this writing, TEI Lex-01 is still a work in progress. This paper is therefore intended as both a contribution to and a commentary on the efforts of the TEI Lex-0 group.