Academia.eduAcademia.edu

Online Information Extraction

description15 papers
group20 followers
lightbulbAbout this topic
Online Information Extraction is the process of automatically retrieving structured information from unstructured or semi-structured data sources on the internet, utilizing algorithms and natural language processing techniques to identify and extract relevant entities, relationships, and events in real-time.
lightbulbAbout this topic
Online Information Extraction is the process of automatically retrieving structured information from unstructured or semi-structured data sources on the internet, utilizing algorithms and natural language processing techniques to identify and extract relevant entities, relationships, and events in real-time.

Key research themes

1. How can dependency parsing and hand-crafted rules improve Open Information Extraction across different languages?

This research area focuses on the development and refinement of Open Information Extraction (OIE) methods using dependency parsing combined with hand-crafted linguistic rules. It is significant because dependency structures capture syntactic relations that enable precise extraction of relational triples without relying on domain-specific training data. The theme also extends to exploring language-specific adaptations, particularly for languages like Portuguese, where generic rules may underperform compared to English.

Key finding: This paper presents DptOIE, an OIE system tailored for Portuguese that combines dependency parsing with a novel set of hand-crafted rules specific to Portuguese linguistic structures. By training its own POS tagger and... Read more

2. What are effective strategies for scalable, web-scale information extraction using linked open data and automated wrapper induction?

This theme investigates methods for performing Information Extraction (IE) at web scale, addressing challenges like scarce labeled data and heterogeneity of web content. It explores leveraging Linked Open Data (LOD) as large-scale semi-structured annotated resources to bootstrap IE, combined with wrapper induction techniques and iterative learning to automate extraction pattern discovery, enabling adaptable and domain-independent extraction.

Key finding: The paper introduces the LODIE project which utilizes Linked Open Data as a rich, large-scale knowledge base to seed and guide web-scale IE. By combining wrapper induction with bootstrapping techniques over LOD-annotated web... Read more

3. How can information extraction workflows be effectively applied in digital libraries using nearly unsupervised methods and what are their practical limitations?

This area evaluates the application of nearly unsupervised Open Information Extraction (OpenIE) workflows in digital library settings, focusing on cross-domain adaptability, extraction quality, and operational costs. It critically examines the challenge of non-canonicalized (heterogeneous and noisy) extractions from unsupervised methods, the required domain expertise, and computational overhead, aiming to bridge the gap between state-of-the-art extraction methods and real-world digital library needs.

Key finding: Through case studies in domains including encyclopedias, pharmacy, and political sciences, this paper demonstrates that nearly unsupervised OpenIE combined with entity linking and canonicalization can produce good precision... Read more
Key finding: This complementary study focuses on workflow design for unsupervised IE, analyzing the portability of state-of-the-art extraction toolboxes across domains, affordability in terms of expertise and computation, and identifying... Read more

All papers in Online Information Extraction

International Conference on NLP, Data Mining and Machine Learning (NLDML 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing, Data... more
This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine... more
This paper describes the Information extraction and content analysis system. The proposed system based on a conditional random eld algorithm and intended to extract aspect terms mentioned in the text. We used a set of morphological... more
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more
Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more
Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more
semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more
Essay explains how today's education system and societal expectations require "educated" individuals to have strong computer and information analysis skills.
Download research papers for free!