Papers by Federico Martelli
L'informazione economica dei giornali
L'elaborato finale intende analizzare il linguaggio specialistico economico nei giornali, pon... more L'elaborato finale intende analizzare il linguaggio specialistico economico nei giornali, ponendo a confronto il linguaggio utilizzato dal quotidiano generalista la Repubblica e quello utilizzato dal quotidiano specializzato Il Sole 24 Ore
ID10M: Idiom Identification in 10 Languages
Findings of the Association for Computational Linguistics: NAACL 2022
Multilingual and Cross-lingual Word-in-Context Disambiguation (MCL-WiC)
In this presentation, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-... more In this presentation, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-in-Context disambiguation (MCL-WiC). This task allows the largely under-investigated inherent ability of systems to discriminate between word senses within and across languages to be evaluated, dropping the requirement of a fixed sense inventory.

DiBiMT: A Novel Benchmark for Measuring Word Sense Disambiguation Biases in Machine Translation
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Lexical ambiguity poses one of the greatest challenges in the field of Machine Translation. Over ... more Lexical ambiguity poses one of the greatest challenges in the field of Machine Translation. Over the last few decades, multiple efforts have been undertaken to investigate incorrect translations caused by the polysemous nature of words. Within this body of research, some studies have posited that models pick up semantic biases existing in the training data, thus producing translation errors. In this paper, we present DiBiMT, the first entirely manually-curated evaluation benchmark which enables an extensive study of semantic biases in Machine Translation of nominal and verbal words in five different language combinations, namely, English and one or other of the following languages: Chinese, German, Italian, Russian and Spanish. Furthermore, we test state-of-the-art Machine Translation systems, both commercial and non-commercial ones, against our new test bed and provide a thorough statistical and linguistic analysis of the results. We release DiBiMT at https://nlp.uniroma1.it/dibimt as a closed benchmark with a public leaderboard.

Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly ... more Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automatic approaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases and annotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly in Word Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographic resources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages. Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted. In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. the lack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manually annotated parallel dataset in 10 European languages. For the purposes of the present p...

In this paper, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-in-Cont... more In this paper, we introduce the first SemEval task on Multilingual and Cross-Lingual Word-in-Context disambiguation (MCL-WiC). This task allows the largely under-investigated inherent ability of systems to discriminate between word senses within and across languages to be evaluated, dropping the requirement of a fixed sense inventory. Framed as a binary classification, our task is divided into two parts. In the multilingual sub-task, participating systems are required to determine whether two target words, each occurring in a different context within the same language, express the same meaning or not. Instead, in the cross-lingual part, systems are asked to perform the task in a cross-lingual scenario, in which the two target words and their corresponding contexts are provided in two different languages. We illustrate our task, as well as the construction of our manually-created dataset including five languages, namely Arabic, Chinese, English, French and Russian, and the results of...

Word Sense Disambiguation (WSD), i.e., the task of assigning senses to words in context, has seen... more Word Sense Disambiguation (WSD), i.e., the task of assigning senses to words in context, has seen a surge of interest with the advent of neural models and a considerable increase in performance up to 80% F1 in English. However, when considering other languages, the availability of training data is limited, which hampers scaling WSD to many languages. To address this issue, we put forward MULTIMIRROR, a sense projection approach for multilingual WSD based on a novel neural discriminative model for word alignment: given as input a pair of parallel sentences, our model – trained with a low number of instances – is capable of jointly aligning, at the same time, all source and target tokens with each other, surpassing its competitors across several language combinations. We demonstrate that projecting senses from English by leveraging the alignments produced by our model leads a simple mBERT-powered classifier to achieve a new state of the art on established WSD datasets in French, Germa...
Natural Language Engineering
Over the last two decades, determining the similarity between words as well as between their mean... more Over the last two decades, determining the similarity between words as well as between their meanings, that is, word senses, has been proven to be of vital importance in the field of Natural Language Processing. This paper provides the reader with an introduction to the tasks of computing word and sense similarity. These consist in computing the degree of semantic likeness between words and senses, respectively. First, we distinguish between two major approaches: the knowledge-based approaches and the distributional approaches. Second, we detail the representations and measures employed for computing similarity. We then illustrate the evaluation settings available in the literature and, finally, discuss suggestions for future research.
SyntagNet: Challenging Supervised Word Sense Disambiguation with Lexical-Semantic Combinations
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
Uploads
Papers by Federico Martelli