Tommaso Caselli

University of Groningen, Center for Language and Cognition Groningen (CLCG), Faculty Member

Vrije Universiteit Amsterdam, Faculteit der Geesteswetenschappen, Post-Doc

Followers

136

Following

Co-authors

Public Views

InterestsView All (7)

Uploads

Papers by Tommaso Caselli

Predicting Controversial News Using Facebook Reactions

Proceedings of the Fourth Italian Conference on Computational Linguistics CLiC-it 2017

English. Different events and their reception in different reader communities may give rise to co... more English. Different events and their reception in different reader communities may give rise to controversy. We propose a distant supervised entropy-based model that uses Facebook reactions as proxies for predicting news controversy. We prove the validity of this approach by running within-and across-source experiments, where different news sources are conceived to approximately correspond to different reader communities. Contextually, we also present and share an automatically generated corpus for controversy prediction in Italian. Italiano. Diversi tipi di eventi e la loro percezione in diverse comunità di utenti/lettori possono dare vita a controversie. In questo lavoro proponiamo un modello basato su entropia e sviluppato secondo il paradigma della "distant supervision" per predire controversie sulle notizie usando le reazioni di Facebook come "proxy". La validità dell'approcciò e dimostrata attraverso una serie di esperimenti usando dati provenienti dalla stessa fonte o da fonti diverse. Contestualmente, presentiamo anche un corpus generato automaticamente per la previsione delle controversie in italiano.

Download

HaSpeeDe 2 @ EVALITA2020: Overview of the EVALITA 2020 Hate Speech Detection Task

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detecti... more The Hate Speech Detection (HaSpeeDe 2) task is the second edition of a shared task on the detection of hateful content in Italian Twitter messages. HaSpeeDe 2 is composed of a Main task (hate speech detection) and two Pilot tasks, (stereotype and nominal utterance detection). Systems were challenged along two dimensions: (i) time, with test data coming from a different time period than the training data, and (ii) domain, with test data coming from the news domain (i.e., news headlines). Overall, 14 teams participated in the Main task, the best systems achieved a macro F1-score of 0.8088 and 0.7744 on the in-domain in the out-of-domain test sets, respectively; 6 teams submitted their results for Pilot task 1 (stereotype detection), the best systems achieved a macro F1-score of 0.7719 and 0.7203 on in-domain and out-of-domain test sets. We did not receive any submission for Pilot task 2.

Evalita 2018: Overview on the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian

EVALITA Evaluation of NLP and Speech Tools for Italian

Download

Fighting the COVID-19 Infodemic with a Holistic BERT Ensemble

Proceedings of the Fourth Workshop on NLP for Internet Freedom: Censorship, Disinformation, and Propaganda

Meaning in Context: Ontologically and linguistically motivated representations of objects and events

Applied Ontology

Download

A Bilingual Corpus of Inter-linked Events

Lrec, 2008

This paper describes the creation of a bilingual corpus of inter-linked events for Italian and En... more This paper describes the creation of a bilingual corpus of inter-linked events for Italian and English. Linkage is accomplished through the Inter-Lingual Index (ILI) that links ItalWordNet with WordNet. The availability of this resource, on the one hand, enables contrastive analysis of the linguistic phenomena surrounding events in both languages, and on the other hand, can be used to perform multilingual temporal analysis of texts. In addition to describing the methodology for construction of the inter-linked corpus and the analysis of the data collected, we demonstrate that the ILI could potentially be used to bootstrap the creation of comparable corpora by exporting layers of annotation for words that have the same sense.

Download

Enriching the "Senso Comune" Platform with Automatically Acquired Data

Using a generative lexicon resource to compute bridging anaphora in Italian

Procesamiento Del Lenguaje Natural, 2009

Abstract: This article reports on a preliminary work on the use of a Generative Lexicon based lex... more

Data-Driven Approach Using Semantics for Recognizing and Classifying TimeML Events in Italian

Abstract We present a data-driven approach for recognizing and classifying TimeML events in Itali... more

FBK-TR: SVM for Semantic Relatedness and Corpus Patterns for RTE

This paper reports the description and scores of our system, FBK-TR, which participated at the Se... more This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for identi- fying the entailment values on the basis of both semantic relatedness scores and entailment patterns based on verb-specific semantic frames. The system ranked 11th on both tasks with competitive results.

FBK-TR: SVM for Semantic Relatedeness and Corpus Patterns for RTE

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014

This paper reports the description and scores of our system, FBK-TR, which participated at the Se... more This paper reports the description and scores of our system, FBK-TR, which participated at the SemEval 2014 task #1 "Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Entailment". The system consists of two parts: one for computing semantic relatedness, based on SVM, and the other for identifying the entailment values on the basis of both semantic relatedness scores and entailment patterns based on verb-specific semantic frames. The system ranked 11 th on both tasks with competitive results.

Download

FBK-TR: Applying SVM with Multiple Linguistic Features for Cross-Level Semantic Similarity

Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 2014

Recently, the task of measuring semantic similarity between given texts has drawn much attention ... more Recently, the task of measuring semantic similarity between given texts has drawn much attention from the Natural Language Processing community. Especially, the task becomes more interesting when it comes to measuring the semantic similarity between different-sized texts, e.g paragraph-sentence, sentence-phrase, phrase-word, etc. In this paper, we, the FBK-TR team, describe our system participating in Task 3 "Cross-Level Semantic Similarity", at SemEval 2014. We also report the results obtained by our system, compared to the baseline and other participating systems in this task.

Download

Detecting Attribution Relations in Speech: a Corpus Study

ABSTRACT In this work we present a methodology for the annotation of Attri-bution Relations (ARs)... more ABSTRACT In this work we present a methodology for the annotation of Attri-bution Relations (ARs) in speech which we apply to create a pilot corpus of spo-ken informal dialogues. This represents the first step towards the creation of a re-source for the analysis of ARs in speech and the development of automatic extrac-tion systems. Despite its relevance for speech recognition systems and spoken language understanding, the relation hold-ing between quotations and opinions and their source has been studied and extracted only in written corpora, characterized by a formal register (news, literature, scientific articles). The shift to the informal register and to a spoken corpus widens our view of this relation and poses new challenges. Our hypothesis is that the decreased relia-bility of the linguistic cues found for writ-ten corpora in the fragmented structure of speech could be overcome by including prosodic clues in the system. The analysis of SARC confirms the hypothesis show-ing the crucial role played by the acous-tic level in providing the missing lexical clues.

Words in context: a reference perspective on the lexicon

ABSTRACT In this paper, we present a rich contex-tual perspective on the lexicon and back-ground ... more ABSTRACT In this paper, we present a rich contex-tual perspective on the lexicon and back-ground knowledge for the purpose of deep semantic parsing. In the project Under-standing Language By machine 1 , we ad-dress various aspects of semantics in rela-tion to i.) reference to entities and event in-stances, ii.) modeling of author and reader perspectives. Lexical resources and even resources with world-knowledge such as Wikipedia do not provide the episodic knowledge that is needed to determine ref-erence and eventually meaning. Most re-sources and also the Natural Language Processing that uses these resources fo-cus too much on semantic knowledge and local context. We argue that we need richer and more complex context mod-els that integrate episodic knowledge, dis-course structure and reader/writer perspec-tives to be able to correctly process text. We outline the directions of research that our project follows and the different as-pects that we will study.

Customizable SCF Acquisition in Italian

Technologies and Tools for Lexical Acquisition

SemEval-2010 task 13: TempEval-2

Proceedings of the 5th …, 2010

Tempeval-2 comprises evaluation tasks for time expressions, events and temporal re-lations, the l... more

How Concrete Do We Get Telling Stories?

Topics in cognitive science, Jul 1, 2018

Will reading different stories about the same event in the world result in a similar image of the... more Will reading different stories about the same event in the world result in a similar image of the world? Will reading the same story by different people result in a similar proxy for experiencing the story? The answer to both questions is no because language is abstract by definition and relies on our episodic experience to turn a story into a more concrete mental movie. Since our episodic knowledge differs, also the mental movie will be different. Language leaves out details, and this becomes specifically clear when building machines that read texts to represent events and to establish event relations across mentions, such as co-reference, causality, subevents, scripts, timelines, and storylines. There is a lot of information and knowledge on the event that is not in the text but is needed to reconstruct these relations and understand the story. Machines lack this knowledge and experience and likewise make explicit what it takes to understand stories from text. In this paper, we re...

Download

On abstraction: decoupling conceptual concreteness and categorical specificity

by Marianna Bolognesi, Tommaso Caselli, and Christian Burgers

Conceptual concreteness and categorical specificity are two continuous variables that allow disti... more Conceptual concreteness and categorical specificity are two continuous variables that allow distinguishing, for example, justice (low concreteness) from banana (high concreteness) and furniture (low specificity) from rocking chair (high specificity). The relation between these two variables is unclear, with some scholars suggesting that they might be highly correlated. In this study, we operationalize both variables and conduct a series of analyses on a sample of > 13,000 nouns, to investigate the relationship between them. Concreteness is operationalized by means of concreteness ratings, and specificity is operational-ized as the relative position of the words in the WordNet taxonomy, which proxies this variable in the hypernym semantic relation. Findings from our studies show only a moderate correlation between concreteness and specificity. Moreover, the intersection of the two variables generates four groups of words that seem to denote qualitatively different types of concepts, which are, respectively, highly specific and highly concrete (typical concrete concepts denoting individual nouns), highly specific and highly abstract (among them many words denoting human-born creation and concepts within the social reality domains), highly generic and highly concrete (among which many mass nouns, or uncountable nouns), and highly generic and highly abstract (typical abstract concepts which are likely to be loaded with affective information, as suggested by previous literature). These results suggest that future studies should consider concreteness and specificity as two distinct dimensions of the general phenomenon called abstraction.

Download

Rule-based creation of timeML documents from dependency trees

by Matteo Grella and Tommaso Caselli

AI* IA 2011: Artificial Intelligence Around …, Jan 1, 2011

The access to information through content has become the new frontier in NLP. Innovative annotati... more

Tommaso Caselli

Uploads

Papers by Tommaso Caselli

Log In