Academia.eduAcademia.edu

Event information extraction

description25 papers
group2 followers
lightbulbAbout this topic
Event information extraction is a subfield of natural language processing that focuses on identifying and extracting structured information about events from unstructured text. This includes recognizing event triggers, participants, time, location, and other relevant attributes to facilitate the organization and analysis of event-related data.
lightbulbAbout this topic
Event information extraction is a subfield of natural language processing that focuses on identifying and extracting structured information about events from unstructured text. This includes recognizing event triggers, participants, time, location, and other relevant attributes to facilitate the organization and analysis of event-related data.

Key research themes

1. How can ontologies and structured semantic frameworks enhance the effectiveness of event extraction from unstructured text?

This research area focuses on developing comprehensive and flexible ontologies and semantic resources that define event types, argument roles, and analytic dimensions for event extraction (EE). Such ontological frameworks are vital because they provide structured guidance to automate the identification and classification of events and their participants in text, improving accuracy and domain adaptability. Addressing limitations in previous ontologies—such as narrow topical coverage, inflexible argument role definitions, and lack of analytical granularity—can yield better event extraction systems that serve diverse applications including knowledge base construction, summarization, and crisis monitoring.

Key finding: This paper proposes the COfEE event ontology addressing shortcomings of popular ontologies like ACE, CAMEO, and ICEWS that suffer from limited topical coverage (mainly political events), rigid argument role definitions, and... Read more
Key finding: This work highlights a hybrid approach combining knowledge-driven (ontology and pattern-based) and data-driven techniques to improve EE system performance in Russian, a less-resourced language for EE. It develops linguistic... Read more
Key finding: The paper applies open information extraction (OIE) combined with ontological reasoning to reduce expert intervention in EE for the domain of management change events. Unlike earlier approaches relying heavily on manual... Read more

2. What modeling and joint learning strategies effectively handle complex phenomena such as role overlaps and ambiguity in multilingual event extraction?

This theme explores approaches addressing key challenges in event extraction, especially in languages like Chinese, where word segmentation ambiguities and overlapping semantic roles frequently occur. Researchers investigate methods that model interdependencies among event triggers, arguments, and roles jointly rather than in pipelined stages, allowing for simultaneous resolution of ambiguous and overlapping event elements. Such joint frameworks utilize pre-trained language models and reformulate argument extraction as a relation triple extraction problem to improve robustness in multilingual settings and complex event structures.

by Nuo Xu
Key finding: This paper defines an event relation triple representation capturing interdependencies among event triggers, arguments, and roles explicitly, converting argument extraction into relation triple extraction. Employing a... Read more
Key finding: Applying a three-stage classification process (trigger word tagging, simple event extraction, and complex event extraction) using the MIRA online learning framework, this paper demonstrates tunable precision-recall trade-offs... Read more
Key finding: This study presents a high-precision event extraction system focused on biomedical texts, leveraging a probabilistic Earley chart parsing algorithm for event composition. The approach treats event structures analogously to... Read more

3. How can computational frameworks leverage high-level event representations and distributed semantic processing to improve extraction and reasoning over complex and large-scale event streams?

This theme investigates methods for modeling and processing events at levels above isolated event occurrences, incorporating temporal, spatial, and semantic dimensions for complex event processing (CEP). It includes approaches that extend traditional event models by integrating RDF semantics with temporal reasoning capabilities and distributed architectures to achieve scalability. Further, it addresses techniques for mining holistic or object-centric event logs, capturing interrelated behavior in event data streams, thus facilitating predictive analytics, pattern detection, and knowledge population in dynamic and heterogeneous data environments.

Key finding: This paper proposes an extended RDF-based event data model that incorporates temporal reasoning directly at the RDF level, addressing limitations of existing SCEP systems that lack the notion of time and rely on centralized... Read more
Key finding: The authors introduce a framework for detecting high-level events that capture holistic and system-wide process states emerging from clusters of temporally proximate events across multiple process instances. By segmenting... Read more
Key finding: This study develops a framework to extract and encode features from object-centric event logs where events relate to multiple objects of various types, reflecting interactions between concurrent processes. It critiques the... Read more

All papers in Event information extraction

Chat-based Social Engineering (CSE) is widely recognized as a key factor to successful cyber-attacks, especially in small and medium-sized enterprise (SME) environments. Despite the interest in preventing CSE attacks, few studies have... more
Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to... more
In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and... more
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event... more
Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to... more
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY
Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to... more
Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zeroshot settings for socio-political event information collection is achieved in the scope of the shared... more
Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zeroshot settings for socio-political event information collection is achieved in the scope of the shared... more
The CoNLL-2003 corpus for Englishlanguage named entity recognition (NER) is one of the most influential corpora for NER model research. A large number of publications, including many landmark works, have used this corpus as a source of... more
Automated metadata annotation is only as good as training dataset, or rules that are available for the domain. It's important to learn what type of data content a pre-trained machine learning algorithm has been trained on to... more
Event extraction (EE) is one of the core information extraction tasks, whose purpose is to automatically identify and extract information about incidents and their actors from texts. This may be beneficial to several domains such as... more
Timely analysis of cyber-security information necessitates automated information extraction from unstructured text. While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is... more
This work is funded by the European Research Council (ERC) Starting Grant awarded to Dr. Erdem Yörük for the project Emerging Markets Welfare (project ID 714868). The research project is hosted by the Koç University and has benefited from... more
The task of event extraction (EE) aims to find the events and event-related argument information from the text and represent them in a structured format. Most previous works try to solve the problem by separately identifying multiple... more
by Nuo Xu
Event extraction is an essential yet challenging task in information extraction. Previous approaches have paid little attention to the problem of roles overlap which is a common phenomenon in practice. To solve this problem, this paper... more
This work is funded by the European Research Council (ERC) Starting Grant awarded to Dr. Erdem Yörük for the project Emerging Markets Welfare (project ID 714868). The research project is hosted by the Koç University and has benefited from... more
We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus... more
The purpose of this research was to automatically extract catchphrases given a set of Legal documents. For this task, our focus was mainly on the Machine learning approaches: a comparative approach was used between the unsupervised and... more
Environmental, Social, and Governance (ESG) are non-financial factors that are garnering attention from investors as they increasingly look to apply these as part of their analysis to identify material risks and growth opportunities. Some... more
In social media, same news or events are associated with two or more people, sometimes with different perspective. The representation of the news or events varies from person to person, perspective to perspective or time to time. In this... more
In this paper we study the combined use of four different NLP toolkits-Stanford CoreNLP, GATE, OpenNLP and Twitter NLP tools-in the context of social media posts. Previous studies have shown performance comparisons between these tools,... more
Masked language models (MLMs) have contributed to drastic performance improvements with regard to zero anaphora resolution (ZAR). To further improve this approach, in this study, we made two proposals. The first is a new pretraining task... more
We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus... more
Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared... more
Emerging Markets Welfare project investigates the effects of contentious politics on welfare state programs in countries of the Global South. It hypothesizes that government response to social contention is a significant factor that... more
In the past years social media services received content contributions from millions of users, making them a fruitful source for data analysis. In this paper we present a novel approach for mining Twitter data in order to extract factual... more
We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus... more
In this paper we show how the performance of tweet clustering can be improved by leveraging character-based neural networks. The proposed approach overcomes the limitations related to the vocabulary explosion in the word-based models and... more
Microblogging services such as Twitter, Facebook, and Four-square have become major sources for information about real-world events. Most approaches that aim at extracting event information from such sources typically use the tem-poral... more
In the past years social media services received content contributions from millions of users, making them a fruitful source for data analysis. In this paper we present a novel approach for mining Twitter data in order to extract factual... more
Event Detection has been one of the research areas in Text Mining that has attracted attention during this decade due to the widespread availability of social media data specifically twitter data. Twitter has become a major source for... more
This work is devoted to Natural Language Generation (NLG) problem. The modern approaches in this area based on deep neural networks are considered. The most famous and promising deep neural network architectures that are related to this... more
The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection... more
Event Detection has been one of the research areas in Text Mining that has attracted attention during this decade due to the widespread availability of social media data specifically twitter data. Twitter has become a major source for... more
Media coverage has a substantial effect on the public perception of events. Nevertheless, media outlets are often biased. One way to bias news articles is by altering the word choice. The automatic identification of bias by word choice is... more
Event Detection has been one of the research areas in Text Mining that has attracted attention during this decade due to the widespread availability of social media data specifically twitter data. Twitter has become a major source for... more
Neural relation extraction discovers semantic relations between entities from unstructured text using deep learning methods. In this study, we make a clear categorization of the existing relation extraction methods in terms of data... more
Although previous research on Aspect-based Sentiment Analysis (ABSA) for Indonesian reviews in hotel domain has been conducted using CNN and XGBoost, its model did not generalize well in test data and high number of OOV words contributed... more
For many business applications, we often seek to analyze sentiments associated with any arbitrary aspects of commercial products, despite having a very limited amount of labels or even without any labels at all. However, existing aspect... more
Benchmarking state-of-the-art text classification and information extraction systems in multilingual, cross-lingual, few-shot, and zero-shot settings for socio-political event information collection is achieved in the scope of the shared... more
We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus... more
Topic Detection and Tracking (TDT) on Twitter emulates human identifying developments in events from a stream of tweets, but while event participants are important for humans to understand what happens during events, machines have no... more
We analyze the effect of further retraining BERT with different domain specific data as an unsupervised domain adaptation strategy for event extraction. Portability of event extraction models is particularly challenging, with large... more
We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we... more
Chat-based Social Engineering (CSE) is widely recognized as a key factor to successful cyber-attacks, especially in small and medium-sized enterprise (SME) environments. Despite the interest in preventing CSE attacks, few studies have... more
Newsworthy stories are increasingly being shared through social networking platforms such as Twitter and Reddit, and journalists now use them to rapidly discover stories and eye-witness accounts. We present a technique that detects... more
We present models which complete missing text given transliterations of ancient Mesopotamian documents, originally written on cuneiform clay tablets (2500 BCE-100 CE). Due to the tablets' deterioration, scholars often rely on contextual... more
We describe a gold standard corpus of protest events that comprise various local and international English language sources from various countries. The corpus contains document-, sentence-, and token-level annotations. This corpus... more
Extraction of financial and economic events from text has previously been done mostly using rule-based methods, with more recent works employing machine learning techniques. This work is in line with this latter approach, leveraging... more
Download research papers for free!