Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects (LT4VarDial'15). 2nd Discriminating between Similar Languages Shared Task (DSL'15), Sep 9, 2015
This paper describes the submission made by the MMS team to the Discriminating between Similar La... more This paper describes the submission made by the MMS team to the Discriminating between Similar Languages (DSL) shared task 2015. We participated in the closed submission track using only the dataset provided by the shared task organisers which contained short texts from 13 similar languages and language varieties. We submitted three runs using different systems and compare their performance. As a result, our best system achieved 95.24% accuracy for test set A (containing original texts) and 92.78% accuracy for test set B (containing texts without named entities).
Uploads
Books by Hernani Costa
Papers by Hernani Costa
Several variables and external criteria are usually followed when building a corpus but little is been said about textual distributional similarity in this context and the quality that it brings to research. In an attempt to fulfil this gap, this paper aims at presenting a simple but efficient methodology capable of measuring a corpus internal degree of relatedness. To do so, this methodology takes advantage of both available natural language processing technology and statistical methods in a successful attempt to access the relatedness degree between documents. Our findings prove that using a list of common entities and a set of distributional similarity measures is enough not only to describe and assess the degree of relatedness between the documents in a comparable corpus, but also to rank them according to their degree of relatedness within the corpus.
In this paper, we describe an ongoing Recommender System application, that implements a Multiagent System, with the purpose of gathering heterogeneous information from different sources and selectively deliver it based on: user's preferences; the community's trends; and on the emotions that it elicits in the user."