Academia.eduAcademia.edu

Named Entity Extraction

description263 papers
group84 followers
lightbulbAbout this topic
Named Entity Extraction (NEE) is a subtask of information extraction that involves identifying and classifying key entities in text into predefined categories, such as names of people, organizations, locations, dates, and other specific terms, facilitating the organization and retrieval of information from unstructured data.
lightbulbAbout this topic
Named Entity Extraction (NEE) is a subtask of information extraction that involves identifying and classifying key entities in text into predefined categories, such as names of people, organizations, locations, dates, and other specific terms, facilitating the organization and retrieval of information from unstructured data.

Key research themes

1. How can machine learning methods, specifically Hidden Markov Models, be employed and optimized for Named Entity Recognition across diverse languages and domains?

This research area investigates the application of Hidden Markov Models (HMMs) and their derivatives in performing NER tasks. It focuses on the adaptability, language independence, and performance of HMM-based systems, particularly comparing them to rule-based and other machine learning methods. The theme addresses challenges such as resource-poor languages, e.g., Indian languages, and domain-specific difficulties, aiming to design robust, scalable NER systems with high accuracy and portability.

Key finding: The paper demonstrates that a Hidden Markov Model-based NER system can be effectively used in resource-poor and morphologically rich Indian languages by exploiting language-independent dynamic state modeling and statistical... Read more
Key finding: The study presents an HMM-based chunk tagger that integrates various internal and external evidences, including morphological and semantic features, to recognize named entities effectively. Evaluated on English MUC-6 and... Read more
Key finding: The authors report an HMM-based biomedical NER system enhanced solely by part-of-speech (POS) tagging information, demonstrating that inclusion of POS features helps mitigate class imbalance and boundary detection issues... Read more

2. What roles do hybrid and deep learning approaches play in improving Named Entity Recognition performance especially in data-scarce or domain-specific contexts?

This theme encompasses hybrid NER systems combining rule-based, machine learning, clustering, and deep learning techniques to handle challenges such as lack of annotated data, domain adaptation (e.g., legal, judicial), and complex entity boundaries. It focuses on models that balance knowledge-driven and data-driven features, enabling flexible, accurate NER when labeled datasets are insufficient or unavailable.

Key finding: The paper proposes a hybrid NER framework merging rule-based, deep learning (neural networks with embeddings), and clustering approaches, augmented with a knowledge-based postprocessing module. Evaluated on legal court case... Read more
Key finding: This work introduces an automated annotation tool to generate domain-specific annotated corpora, exemplified on agricultural queries for crops and pests. The automatically annotated dataset enabled training spaCy-based NER... Read more
Key finding: The comparative study evaluates contextual embeddings (BERT variants) versus non-contextual embeddings (Word2Vec, FastText) in Hindi NER, overcoming challenges such as lack of capitalization and spelling variations in... Read more

3. How does syntactic and semantic parsing influence the accuracy and boundary detection in Named Entity Recognition tasks?

This research focuses on leveraging syntactic parsing techniques (dependency, constituency, semantic parsing) to improve NER systems. Parsing provides structural and relational information that aids in delimiting entity boundaries, disambiguating entity types, and extracting nested or complex entities. The theme investigates the underutilization of parsing in NER and explores integrating parsing features or parsing-driven modeling to achieve more precise named entity identification.

Key finding: The paper examines how syntactic parsing—both dependency and constituency—can enhance NER by revealing sentence structure cues that identify entity presence and boundaries, e.g., direct objects and nested phrases. It reviews... Read more
Key finding: This study, focusing on Portuguese, showcases a rule-based system for extracting family semantic relations through pattern matching on parsed syntactic structures, using noun phrases, verbs, and prepositional relations to... Read more
Key finding: The case study uses a semiautomatic pipeline combining digitization, transcription, and NLP (including parsing and rule-based techniques) to extract personal and genealogical entities from archival historical documents.... Read more

All papers in Named Entity Extraction

This paper gives an overview of the history of prosopographical projects at KU Leuven, starting with the Prosopographia Ptolemaica in the interbellum, its successor Trismegistos People, and Trismegistos' newest feature, the Names in the... more
Este libro refleja el trabajo realizado bajo investigación entre docentes investigadores con el afán de que sea útil al lector, el uso de predicciones al momento de entrenar un algoritmo clasificado de texto en procesamiento de lenguaje... more
The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence... more
Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian... more
A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of... more
Le but de cet article est de présenter de façon synthétique le dernier état du cadre théorique mis en oeuvre dans mes travaux récents. Si certains de ses éléments n'ont pas changé depuis le début, les orientations nouvelles apparues... more
Co-reference resolution is an important part of natural language understanding and it's been affected by the current corpora lacking in diversity. This project presents the implementation of two models for masked language modeling... more
In interactive searching environments, robust linguistic techniques can provide sophisticated search assistance with a reasonable tolerance to errors, because users can easily select relevant items and dismiss the noisy bits. The general... more
Resumen. La identificación automática del humor resulta una tarea compleja, ya que lo que provoca el humor aún no está completamente caracterizado. Se han presentado varios enfoques para detectar humor siendo la mayoría en inglés . Esta... more
In our paper we describe our second collective challenge to NTCIR-6 Question Answering Challenge (QAC4). Also this time we decided to investigate the limits of the "as automatic as possible" approach to
In our paper we describe our second collective challenge to NTCIR-6 Question Answering Challenge (QAC4). Also this time we decided to investigate the limits of the "as automatic as possible" approach to
In this paper we propose an approach for identifying syntactic behaviours related to lexical items and linking them to the meanings. This approach is based on the analysis of the textual content presented in LMF normalized dictionaries by... more
Information about location and geographical coordinates in particular, may be very important during a crisis event, especially for search and rescue operations – but currently geo-tagged tweets are extremely rare. Improved capabilities of... more
With a dataset of 1.3 million articles from arXiv, we explore the potential of classifying research papers based solely on their abstracts and titles. We extract abstracts and titles from the arXiv dataset and fine-tune multiple... more
Named Entity Recognition (NER) is a crucial task in Natural Language Processing (NLP), which involves identifying and categorizing named entities in unstructured text data. In recent years, deep learning-based approaches such as Long... more
Online social networks convey rich information about geospatial facets of reality. However in most cases, geographic information is not explicit and structured, thus preventing its exploitation in real-time applications. We address this... more
Online social networks convey rich information about geospatial facets of reality. However in most cases, geographic information is not explicit and structured, thus preventing its exploitation in real-time applications. We address this... more
Cross-language information retrieval consists in providing a query in one language and searching documents in different languages. Retrieved documents are ordered by the probability of being relevant to the user's request with the highest... more
La acentuación de palabras cuando se escribe un texto en español es un problema de ambigüedad, debido a que muchas palabras llevan acento o no dependiendo del contexto de la frase. El problema de la ambigüedad está relacionado con la... more
In financial services industry, compliance involves a series of practices and controls in order to meet key regulatory standards which aim to reduce financial risk and crime, e.g. money laundering and financing of terrorism. Faced with... more
In financial services industry, compliance involves a series of practices and controls in order to meet key regulatory standards which aim to reduce financial risk and crime, e.g. money laundering and financing of terrorism. Faced with... more
La clasificación de subjetividad es un ámbito de la minería de texto poco estudiado en el idioma español, y sin embargo sus aplicaciones son extensas. Su estudio permite comprender mejor la semántica de un texto y la intención de su... more
Human beings are capable of categorizing a document based on its topic. Computers are already able to perform very well on that task. However, when translating from one language to another, the human translator will use this knowledge to... more
We will report evaluation of Automatic Named Entity Extraction feature of IR tools on Dutch, French, and English text. The aim is to analyze the competency of off-the-shelf information extraction tools in recognizing entity types... more
The paper presents an integral framework for multilingual lexical databases (henceforth MLLD) based on Compreno technology. It differs from the existing approaches to MLLD in the following aspects: 1) it is based on a universal semantic... more
Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations. Most existing NER systems are based on supervised learning.... more
Temporal event signature mining for knowledge discovery is a difficult problem. In this paper a framework is designed to know a temporal knowledge about the large scales signature mining of longitudinal heterogeneous event data. This... more
These days, the number of data sources an ordinary computer user works with every day is very large and continues to grow. With the increasing number of cloud services with specialized functionalities, the users are faced with the... more
We propose a new method of using multiple documents as evidence with decreased adding to improve the performance of a question-answering system. Sometimes, the answer to a question may be found in multiple documents. In such cases, using... more
Sentiment classification has been crucial for many natural language processing (NLP) applications, such as the analysis of movie reviews, tweets, or customer feedback. A sufficiently large amount of data is required to build a robust... more
We will report evaluation of Automatic Named Entity Extraction feature of IR tools on Dutch, French, and English text. The aim is to analyze the competency of off-the-shelf information extraction tools in recognizing entity types... more
We will report evaluation of Automatic Named Entity Extraction feature of IR tools on Dutch, French, and English text. The aim is to analyze the competency of off-the-shelf information extraction tools in recognizing entity types... more
Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian... more
GeoCLEF is an evaluation initiative for testing queries with a geographic specification in large set of text documents. GeoCLEF ran a regular track for the third time within the Cross Language Evaluation Forum (CLEF) 2008. The purpose of... more
Development of m odern Cadastral Information Systems (CIS) requires deployment of tools for automatic estimation of real estates' value which is influenced by a number of factors. After differentiation of the factors, apropriate... more
Two salient properties of user behavior make Help Desk a unique speech application different from the more general transactional kind: (a) majority of users have only vague ideas about their problem, and (b) these users are likely to... more
A framework is proposed for enterprise automated call routing system development and large scalable natural language call routing application deployment based on IBM's speech recognition and NLU application engagement practices in... more
Two salient properties of user behavior make Help Desk a unique speech application different from the more general transactional kind: (a) majority of users have only vague ideas about their problem, and (b) these users are likely to... more
A framework is proposed for enterprise automated call routing system development and large scalable natural language call routing application deployment based on IBM's speech recognition and NLU application engagement practices in... more
Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations. Most existing NER systems are based on supervised learning.... more
Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations. Most existing NER systems are based on supervised learning.... more
In course of a breaking news event, such as natural calamity, political uproar etc., a massive crowd sourced data is generated over social media which makes social media platforms an important source of information in such scenarios. The... more
Los problemas que presentan los modelos neuronales de procesamiento del lenguaje y la representación del significado derivan de dos problemas principales: el problema del 'binding' y el problema de la composicionalidad. A su vez estos dos... more
In this paper, we describe our approach for Named Entity rEcognition and Linking Challenge (NEEL) at the #Microposts2016. The task is to automatically recognize entities and their types from English microposts, and link them to... more
This paper presents an approach for constructing an ontology from a stream of documents. Named entities extracted from the documents are used as instances of the ontology. Entities and co-occurring entity pairs are represented by feature... more
We propose a new method of using multiple documents as evidence with decreased adding to improve the performance of question-answering systems.Sometimes,the answer to a question may be found in multiple documents.In such cases,using... more
Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations. Most existing NER systems are based on supervised learning.... more
Classification of crisis events, such as natural disasters, terrorist attacks and pandemics, is a crucial task to create early signals and inform relevant parties for spontaneous actions to reduce overall damage. Despite the crises, such... more
Download research papers for free!