Academia.eduAcademia.edu

Online Information Extraction

description15 papers
group20 followers
lightbulbAbout this topic
Online Information Extraction is the process of automatically retrieving structured information from unstructured or semi-structured data sources on the internet, utilizing algorithms and natural language processing techniques to identify and extract relevant entities, relationships, and events in real-time.
lightbulbAbout this topic
Online Information Extraction is the process of automatically retrieving structured information from unstructured or semi-structured data sources on the internet, utilizing algorithms and natural language processing techniques to identify and extract relevant entities, relationships, and events in real-time.

Key research themes

1. How can dependency parsing and hand-crafted rules improve Open Information Extraction across different languages?

This research area focuses on the development and refinement of Open Information Extraction (OIE) methods using dependency parsing combined with hand-crafted linguistic rules. It is significant because dependency structures capture syntactic relations that enable precise extraction of relational triples without relying on domain-specific training data. The theme also extends to exploring language-specific adaptations, particularly for languages like Portuguese, where generic rules may underperform compared to English.

Key finding: This paper presents DptOIE, an OIE system tailored for Portuguese that combines dependency parsing with a novel set of hand-crafted rules specific to Portuguese linguistic structures. By training its own POS tagger and... Read more

2. What are effective strategies for scalable, web-scale information extraction using linked open data and automated wrapper induction?

This theme investigates methods for performing Information Extraction (IE) at web scale, addressing challenges like scarce labeled data and heterogeneity of web content. It explores leveraging Linked Open Data (LOD) as large-scale semi-structured annotated resources to bootstrap IE, combined with wrapper induction techniques and iterative learning to automate extraction pattern discovery, enabling adaptable and domain-independent extraction.

Key finding: The paper introduces the LODIE project which utilizes Linked Open Data as a rich, large-scale knowledge base to seed and guide web-scale IE. By combining wrapper induction with bootstrapping techniques over LOD-annotated web... Read more

3. How can information extraction workflows be effectively applied in digital libraries using nearly unsupervised methods and what are their practical limitations?

This area evaluates the application of nearly unsupervised Open Information Extraction (OpenIE) workflows in digital library settings, focusing on cross-domain adaptability, extraction quality, and operational costs. It critically examines the challenge of non-canonicalized (heterogeneous and noisy) extractions from unsupervised methods, the required domain expertise, and computational overhead, aiming to bridge the gap between state-of-the-art extraction methods and real-world digital library needs.

Key finding: Through case studies in domains including encyclopedias, pharmacy, and political sciences, this paper demonstrates that nearly unsupervised OpenIE combined with entity linking and canonicalization can produce good precision... Read more
Key finding: This complementary study focuses on workflow design for unsupervised IE, analyzing the portability of state-of-the-art extraction toolboxes across domains, affordability in terms of expertise and computation, and identifying... Read more

All papers in Online Information Extraction

A few years back we embarked on an expedition into the rapidly transforming landscape of data research, the narratives of big data and the practices emerging with novel data resources, tools and new directions of social and cultural... more
This paper proposes a flow-batch methodology for the determination of free glycerol in biodiesel that is notably eco-friendly, since non-chemical reagents are used. Deionized water (the solvent) was used alone for glycerol (sample)... more
This paper show how location named entity (LNE) extraction and annotation, which makes part of our named entity recognition (NER) systems, is an important task in managing the great amount of data. In this paper, we try to explain our... more
Digital research is often understood as data-driven. Yet the ways in which data are already informed by specific analytical assumptions and inscriptions of the media in which they originate, circulate, or are being used is often... more
The paper studies the diversity of ways to express entity aspects in users’ reviews. Besides explicit aspect terms, it is possible to distinguish implicit aspect terms and sentiment facts. These subtypes of aspect terms were annotated... more
Abstrak Meningkatnya interaksi pengguna internet dan media sosial tentu memiliki dampak terhadap peningkatan jumlah data atau konten yang dihasilkan oleh pengguna. Data atau konten yang dihasilkan sering disebut dengan User Generated... more
This study seeks to contribute to recent debates concerning computational social science by experimenting with ‘co‐occurrence analysis’ on a Twitter dataset relating to the subject of the recently introduced General Data Protection... more
There is a plethora of publications emerging in the humanities, especially media studies, that use data points from social media platforms in order to investigate social interaction and cultural production. Data points taken from social... more
This paper inquires in the politics of real-time in online media. It suggests that real-time cannot be accounted for as a universal temporal frame in which events happen, but explores the making of real-time from a device perspective... more
Aquilo que muitos conhecem popularmente como Internet, caracteriza-se, socioculturalmente como ciberespaço e possui territorialidade própria, bem como suas próprias práticas culturais, identificadas como cibercultura. Tendo em vista que,... more
This is the first e-special issue for the journal Sociology and its chosen focus is the article 'The coming crisis of empirical sociology' by . This article challenged sociologists with a variety of questions about the role, relevance and... more
There is a need to study whether consumer trust and e-commerce information quality are the answers to the question what drives customers' purchase decision and consequently their satisfaction. This gap in knowledge can be a significant... more
Qualitative and mixed methods digital social research often relies on gathering and storing social media data through the use of APIs (Application Programming Interfaces). In past years this has been relatively simple, with academic... more
We respond to the two comments on our article `The Coming Crisis of Empirical Sociology' from Rosemary Crompton (2008) and Richard Webber (2009) which have been published in Sociology , as well as issues arising from the wider debate... more
As there is an enormous amount of online research material available, finding pertinent information for specific purposes has become a tedious chore. So there is a requirement of the research paper recommendation system to facilitate... more
As social media technologies such as Twitter, Instagram, and YouTube have become highly ubiquitous, social life itself has become reconfigured. Though early notions of an offline/online binary remain in some quarters of social research,... more
The purpose of this paper is twofold: First, to suggest that techniques for mapping public disagreements over claims to knowledge, or controversies, can act as assistive devices for researchers in geography to move from research topics to... more
Digital Methods can be defined as the repurposing of the inscriptions generated by digital media for the study of collective phenomena. The strength of these methods comes from their capacity to take advantage of the data and... more
— Landslides are the most threatening geo-hazard. It is a kind of genetic type of slope and has same characteristics with slope. Chaotic time series of landslide displacement and its influential factors could reflect the history of... more
The field of sentiment analysis, in which sentiment is gathered, analyzed, and aggregated from text, has seen a lot of attention in the last few years. The corresponding growth of the field has resulted in the emergence of various... more
This paper describes the Information extraction and content analysis system. The proposed system based on a conditional random eld algorithm and intended to extract aspect terms mentioned in the text. We used a set of morphological... more
Essay explains how today's education system and societal expectations require "educated" individuals to have strong computer and information analysis skills.
The World Wide Web organizes information in semi-structured HTML documents. For a template-based web page that contains a list of items, information schema can be implied and structured data can be extracted with a query, i.e. a (web)... more
In this paper, we present our contribution in SemEval2014 ABSA task, some supervised methods for Aspect-Based Sentiment Analysis of restaurant and laptop reviews are proposed, implemented and evaluated. We focus on determining the aspect... more
This paper introduces a distinctive approach to methods development in digital social research called “interface methods.” We begin by discussing various methodological confluences between digital media, social studies of science and... more
"This thesis aims to contribute for the discussions on online research methods, by suggesting the concept of a holistic approach to the study of social media. This idea argues that data, online platforms and tools cannot be perceived as... more
What makes scraping methodologically interesting for social and cultural research?
Recently, Savage and Burrows argued that there is an ‘empirical crisis’ in sociology. They concluded that sociologists should abandon a focus on causality for descriptions that ‘link narrative, numbers, and images’. This article takes up... more
Background. We are witnessing an exponential increase in biomedical research citations in PubMed. However, translating biomedical discoveries into practical treatments is estimated to take around 17 years, according to the 2000 Yearbook... more
Download research papers for free!