Papers by Tommaso Caselli

Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017
English. Different events and their reception in different reader communities may give rise to co... more English. Different events and their reception in different reader communities may give rise to controversy. We propose a distant supervised entropy-based model that uses Facebook reactions as proxies for predicting news controversy. We prove the validity of this approach by running within-and across-source experiments, where different news sources are conceived to approximately correspond to different reader communities. Contextually, we also present and share an automatically generated corpus for controversy prediction in Italian. Italiano. Diversi tipi di eventi e la loro percezione in diverse comunità di utenti/lettori possono dare vita a controversie. In questo lavoro proponiamo un modello basato su entropia e sviluppato secondo il paradigma della "distant supervision" per predire controversie sulle notizie usando le reazioni di Facebook come "proxy". La validità dell'approcciò e dimostrata attraverso una serie di esperimenti usando dati provenienti dalla stessa fonte o da fonti diverse. Contestualmente, presentiamo anche un corpus generato automaticamente per la previsione delle controversie in italiano.

HaSpeeDe 2 @ EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task
EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020
The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detecti... more The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detection of hateful content in Italian Twitter messages. HaSpeeDe 2 is composed of a Main task (hate speech detection) and two Pilot tasks, (stereotype and nominal utterance detection). Systems were challenged along two dimensions: (i) time, with test data coming from a different time period than the training data, and (ii) domain, with test data coming from the news domain (i.e., news headlines). Overall, 14 teams participated in the Main task, the best systems achieved a macro F1-score of 0.8088 and 0.7744 on the in-domain in the out-of-domain test sets, respectively; 6 teams submitted their results for Pilot task 1 (stereotype detection), the best systems achieved a macro F1-score of 0.7719 and 0.7203 on in-domain and out-of-domain test sets. We did not receive any submission for Pilot task 2.
EVALITA Evaluation of NLP and Speech Tools for Italian
Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble
Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda
Lrec, 2008
This paper describes the creation of a bilingual corpus of inter-linked events for Italian and En... more This paper describes the creation of a bilingual corpus of inter-linked events for Italian and English. Linkage is accomplished through the Inter-Lingual Index (ILI) that links ItalWordNet with WordNet. The availability of this resource, on the one hand, enables contrastive analysis of the linguistic phenomena surrounding events in both languages, and on the other hand, can be used to perform multilingual temporal analysis of texts. In addition to describing the methodology for construction of the inter-linked corpus and the analysis of the data collected, we demonstrate that the ILI could potentially be used to bootstrap the creation of comparable corpora by exporting layers of annotation for words that have the same sense.
Enriching the "Senso Comune" Platform with Automatically Acquired Data
Using a generative lexicon resource to compute bridging anaphora in Italian
Procesamiento Del Lenguaje Natural, 2009
Abstract: This article reports on a preliminary work on the use of a Generative Lexicon based lex... more Abstract: This article reports on a preliminary work on the use of a Generative Lexicon based lexical resource to resolve bridging anaphors in Italian. The results obtained, though not very satisfying, seem to support the use of such a resource with respect to WordNet-like ...
Data-Driven Approach Using Semantics for Recognizing and Classifying TimeML Events in Italian
Abstract We present a data-driven approach for recognizing and classifying TimeML events in Itali... more Abstract We present a data-driven approach for recognizing and classifying TimeML events in Italian. A high-performance stateof-the-art approach, TIPSem, is adopted and extended with Italian-specific semantic features from a lexical resource. The resulting approach has ...
FBK-TR: SVM for Semantic Relatedness and Corpus Patterns for RTE
This paper reports the description and scores of our system, FBK-TR, which participated at the Se... more This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for identi- fying the entailment values on the basis of both semantic relatedness scores and entailment patterns based on verb-specific semantic frames. The system ranked 11th on both tasks with competitive results.
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014
This paper reports the description and scores of our system, FBK-TR, which participated at the Se... more This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for identifying the entailment values on the basis of both semantic relatedness scores and entailment patterns based on verb-specific semantic frames. The system ranked 11 th on both tasks with competitive results.
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014
Recently, the task of measuring semantic similarity between given texts has drawn much attention ... more Recently, the task of measuring semantic similarity between given texts has drawn much attention from the Natural Language Processing community. Especially, the task becomes more interesting when it comes to measuring the semantic similarity between different-sized texts, e.g paragraph-sentence, sentence-phrase, phrase-word, etc. In this paper, we, the FBK-TR team, describe our system participating in Task 3 "Cross-Level Semantic Similarity", at SemEval 2014. We also report the results obtained by our system, compared to the baseline and other participating systems in this task.

Detecting Attribution Relations in Speech: a Corpus Study
ABSTRACT In this work we present a methodology for the annotation of Attri-bution Relations (ARs)... more ABSTRACT In this work we present a methodology for the annotation of Attri-bution Relations (ARs) in speech which we apply to create a pilot corpus of spo-ken informal dialogues. This represents the first step towards the creation of a re-source for the analysis of ARs in speech and the development of automatic extrac-tion systems. Despite its relevance for speech recognition systems and spoken language understanding, the relation hold-ing between quotations and opinions and their source has been studied and extracted only in written corpora, characterized by a formal register (news, literature, scientific articles). The shift to the informal register and to a spoken corpus widens our view of this relation and poses new challenges. Our hypothesis is that the decreased relia-bility of the linguistic cues found for writ-ten corpora in the fragmented structure of speech could be overcome by including prosodic clues in the system. The analysis of SARC confirms the hypothesis show-ing the crucial role played by the acous-tic level in providing the missing lexical clues.

Words in context: a reference perspective on the lexicon
ABSTRACT In this paper, we present a rich contex-tual perspective on the lexicon and back-ground ... more ABSTRACT In this paper, we present a rich contex-tual perspective on the lexicon and back-ground knowledge for the purpose of deep semantic parsing. In the project Under-standing Language By machine 1 , we ad-dress various aspects of semantics in rela-tion to i.) reference to entities and event in-stances, ii.) modeling of author and reader perspectives. Lexical resources and even resources with world-knowledge such as Wikipedia do not provide the episodic knowledge that is needed to determine ref-erence and eventually meaning. Most re-sources and also the Natural Language Processing that uses these resources fo-cus too much on semantic knowledge and local context. We argue that we need richer and more complex context mod-els that integrate episodic knowledge, dis-course structure and reader/writer perspec-tives to be able to correctly process text. We outline the directions of research that our project follows and the different as-pects that we will study.
Customizable SCF Acquisition in Italian
Technologies and Tools for Lexical Acquisition
SemEval-2010 task 13: TempEval-2
Proceedings of the 5th …, 2010
Tempeval-2 comprises evaluation tasks for time expressions, events and temporal re-lations, the l... more Tempeval-2 comprises evaluation tasks for time expressions, events and temporal re-lations, the latter of which was split up in four sub tasks, motivated by the notion that smaller subtasks would make both data preparation and temporal relation extrac-tion easier. Manually ...

Topics in cognitive science, Jul 1, 2018
Will reading different stories about the same event in the world result in a similar image of the... more Will reading different stories about the same event in the world result in a similar image of the world? Will reading the same story by different people result in a similar proxy for experiencing the story? The answer to both questions is no because language is abstract by definition and relies on our episodic experience to turn a story into a more concrete mental movie. Since our episodic knowledge differs, also the mental movie will be different. Language leaves out details, and this becomes specifically clear when building machines that read texts to represent events and to establish event relations across mentions, such as co-reference, causality, subevents, scripts, timelines, and storylines. There is a lot of information and knowledge on the event that is not in the text but is needed to reconstruct these relations and understand the story. Machines lack this knowledge and experience and likewise make explicit what it takes to understand stories from text. In this paper, we re...

Conceptual concreteness and categorical specificity are two continuous variables that allow disti... more Conceptual concreteness and categorical specificity are two continuous variables that allow distinguishing, for example, justice (low concreteness) from banana (high concreteness) and furniture (low specificity) from rocking chair (high specificity). The relation between these two variables is unclear, with some scholars suggesting that they might be highly correlated. In this study, we operationalize both variables and conduct a series of analyses on a sample of > 13,000 nouns, to investigate the relationship between them. Concreteness is operationalized by means of concreteness ratings, and specificity is operational-ized as the relative position of the words in the WordNet taxonomy, which proxies this variable in the hypernym semantic relation. Findings from our studies show only a moderate correlation between concreteness and specificity. Moreover, the intersection of the two variables generates four groups of words that seem to denote qualitatively different types of concepts, which are, respectively, highly specific and highly concrete (typical concrete concepts denoting individual nouns), highly specific and highly abstract (among them many words denoting human-born creation and concepts within the social reality domains), highly generic and highly concrete (among which many mass nouns, or uncountable nouns), and highly generic and highly abstract (typical abstract concepts which are likely to be loaded with affective information, as suggested by previous literature). These results suggest that future studies should consider concreteness and specificity as two distinct dimensions of the general phenomenon called abstraction.
Rule-based creation of timeML documents from dependency trees
AI* IA 2011: Artificial Intelligence Around …, Jan 1, 2011
The access to information through content has become the new frontier in NLP. Innovative annotati... more The access to information through content has become the new frontier in NLP. Innovative annotation schemes such as TimeML [4] have push forward this aspect by creating benchmark corpora. In TimeML, an event is defined as something that holds true, obtains/ ...
Uploads
Papers by Tommaso Caselli