Online Information Extraction

description15 papers

group20 followers

lightbulbAbout this topic

Online Information Extraction is the process of automatically retrieving structured information from unstructured or semi-structured data sources on the internet, utilizing algorithms and natural language processing techniques to identify and extract relevant entities, relationships, and events in real-time.

lightbulbAbout this topic

Key research themes

1. How can dependency parsing and hand-crafted rules improve Open Information Extraction across different languages?

This research area focuses on the development and refinement of Open Information Extraction (OIE) methods using dependency parsing combined with hand-crafted linguistic rules. It is significant because dependency structures capture syntactic relations that enable precise extraction of relational triples without relying on domain-specific training data. The theme also extends to exploring language-specific adaptations, particularly for languages like Portuguese, where generic rules may underperform compared to English.

DptOIE: a portuguese Open Information Extraction system based on dependency analysis

by Daniela Claro

2022

Key finding: This paper presents DptOIE, an OIE system tailored for Portuguese that combines dependency parsing with a novel set of hand-crafted rules specific to Portuguese linguistic structures. By training its own POS tagger and... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are effective strategies for scalable, web-scale information extraction using linked open data and automated wrapper induction?

This theme investigates methods for performing Information Extraction (IE) at web scale, addressing challenges like scarce labeled data and heterogeneity of web content. It explores leveraging Linked Open Data (LOD) as large-scale semi-structured annotated resources to bootstrap IE, combined with wrapper induction techniques and iterative learning to automate extraction pattern discovery, enabling adaptable and domain-independent extraction.

Early Steps Towards Web Scale Information Extraction with LODIE

by Ziqi Zhang

2023, AI Magazine

Key finding: The paper introduces the LODIE project which utilizes Linked Open Data as a rich, large-scale knowledge base to seed and guide web-scale IE. By combining wrapper induction with bootstrapping techniques over LOD-annotated web... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can information extraction workflows be effectively applied in digital libraries using nearly unsupervised methods and what are their practical limitations?

This area evaluates the application of nearly unsupervised Open Information Extraction (OpenIE) workflows in digital library settings, focusing on cross-domain adaptability, extraction quality, and operational costs. It critically examines the challenge of non-canonicalized (heterogeneous and noisy) extractions from unsupervised methods, the required domain expertise, and computational overhead, aiming to bridge the gap between state-of-the-art extraction methods and real-world digital library needs.

A detailed library perspective on nearly unsupervised information extraction workflows in digital libraries

by Wolf-tilo Balke

2024, International Journal on Digital Libraries

Key finding: Through case studies in domains including encyclopedias, pharmacy, and political sciences, this paper demonstrates that nearly unsupervised OpenIE combined with entity linking and canonicalization can produce good precision... Read more

articleView Paper downloadDownload

A Library Perspective on Nearly-Unsupervised Information Extraction Workflows in Digital Libraries

by Wolf-tilo Balke

2024, Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Key finding: This complementary study focuses on workflow design for unsupervised IE, analyzing the portability of state-of-the-art extraction toolboxes across domains, affordability in terms of expertise and computation, and identifying... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Online Information Extraction

Call For Papers - International Conference on NLP, Data Mining and Machine Learning (NLDML 2022)

by Advances in Vision Computing: An International Journal (AVC)

2022, NLDML

International Conference on NLP, Data Mining and Machine Learning (NLDML 2022) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Natural Language Computing, Data... more

descriptionView Paper arrow_downwardDownload

AUTOMATIC ARABIC NAMED ENTITY EXTRACTION AND CLASSIFICATION FOR INFORMATION RETRIEVAL

by International Journal on Natural Language Computing (IJNLC)

This article tries to explain our rule-based Arabic Named Entity recognition (NER) and classification system. It is based on lists of classified proper names (PN) and particularly on syntactico-semantic patterns resulting in fine... more

descriptionView Paper arrow_downwardDownload

Aspect Extraction from Reviews Using Conditional Random Fields

by Yuliya Rubtsova

This paper describes the Information extraction and content analysis system. The proposed system based on a conditional random eld algorithm and intended to extract aspect terms mentioned in the text. We used a set of morphological... more

descriptionView Paper arrow_downwardDownload

Top 5 cited Natural Language processing (NLP) articles from IJNLC in 2020

by International Journal on Natural Language Computing (IJNLC)

Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze,... more

descriptionView Paper arrow_downwardDownload

Relation Extraction Based on Pattern Learning Approach

by IRJCS: : International Research Journal of Computer Science

Semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more

descriptionView Paper arrow_downwardDownload

Relation Extraction Based on Pattern Learning Approach

by Mujiono Sadikin

descriptionView Paper

Relation Extraction Based on Pattern Learning Approach

by Mujiono Sadikin

2017

semantically, objects in unstructured document are related each other to perform a certain entity relation. This certain entity relation such: drug-drug interaction through their compounds, buyer-seller relationship through the goods or... more

descriptionView Paper

The Digital Revolution

by Alex Autio

Essay explains how today's education system and societal expectations require "educated" individuals to have strong computer and information analysis skills.

descriptionView Paper arrow_downwardDownload

Online Information Extraction

Key research themes

1. How can dependency parsing and hand-crafted rules improve Open Information Extraction across different languages?

2. What are effective strategies for scalable, web-scale information extraction using linked open data and automated wrapper induction?

3. How can information extraction workflows be effectively applied in digital libraries using nearly unsupervised methods and what are their practical limitations?

Related Topics

All papers in Online Information Extraction