Named Entity Extraction

description263 papers

group84 followers

lightbulbAbout this topic

Named Entity Extraction (NEE) is a subtask of information extraction that involves identifying and classifying key entities in text into predefined categories, such as names of people, organizations, locations, dates, and other specific terms, facilitating the organization and retrieval of information from unstructured data.

lightbulbAbout this topic

Key research themes

1. How can machine learning methods, specifically Hidden Markov Models, be employed and optimized for Named Entity Recognition across diverse languages and domains?

This research area investigates the application of Hidden Markov Models (HMMs) and their derivatives in performing NER tasks. It focuses on the adaptability, language independence, and performance of HMM-based systems, particularly comparing them to rule-based and other machine learning methods. The theme addresses challenges such as resource-poor languages, e.g., Indian languages, and domain-specific difficulties, aiming to design robust, scalable NER systems with high accuracy and portability.

Named Entity Recognition using Hidden Markov Model (HMM)

by International Journal on Natural Language Computing (IJNLC) and

2015

Key finding: The paper demonstrates that a Hidden Markov Model-based NER system can be effectively used in resource-poor and morphologically rich Indian languages by exploiting language-independent dynamic state modeling and statistical... Read more

articleView Paper downloadDownload

Named Entity Recognition using an HMM-based Chunk Tagger

by Shubham Kolhe

2017

Key finding: The study presents an HMM-based chunk tagger that integrates various internal and external evidences, including morphological and semantic features, to recognize named entities effectively. Evaluated on English MUC-6 and... Read more

articleView Paper downloadDownload

Biomedical Named Entity Recognition: A Poor Knowledge HMM-Based Approach

by Ferran Pla

2025, Lecture Notes in Computer Science

Key finding: The authors report an HMM-based biomedical NER system enhanced solely by part-of-speech (POS) tagging information, demonstrating that inclusion of POS features helps mitigate class imbalance and boundary detection issues... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What roles do hybrid and deep learning approaches play in improving Named Entity Recognition performance especially in data-scarce or domain-specific contexts?

This theme encompasses hybrid NER systems combining rule-based, machine learning, clustering, and deep learning techniques to handle challenges such as lack of annotated data, domain adaptation (e.g., legal, judicial), and complex entity boundaries. It focuses on models that balance knowledge-driven and data-driven features, enabling flexible, accurate NER when labeled datasets are insufficient or unavailable.

An innovative hybrid approach for extracting named entities from unstructured text data

by Dr.Anu Thomas

2020, Computational Intelligence

Key finding: The paper proposes a hybrid NER framework merging rule-based, deep learning (neural networks with embeddings), and clustering approaches, augmented with a knowledge-based postprocessing module. Evaluated on legal court case... Read more

articleView Paper downloadDownload

An Algorithm for Automatic Text Annotation for Named Entity Recognition using spaCy Framework

by Murari Kumar

2023

Key finding: This work introduces an automated annotation tool to generate domain-specific annotated corpora, exemplified on agricultural queries for crops and pests. The automatically annotated dataset enabled training spaCy-based NER... Read more

articleView Paper downloadDownload

Analysis Of Contextual and Non-Contextual Word Embedding Models For Hindi NER With Web Application For Data Collection

by Aindriya Barua

2021, Springer

Key finding: The comparative study evaluates contextual embeddings (BERT variants) versus non-contextual embeddings (Word2Vec, FastText) in Hindi NER, overcoming challenges such as lack of capitalization and spelling variations in... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How does syntactic and semantic parsing influence the accuracy and boundary detection in Named Entity Recognition tasks?

This research focuses on leveraging syntactic parsing techniques (dependency, constituency, semantic parsing) to improve NER systems. Parsing provides structural and relational information that aids in delimiting entity boundaries, disambiguating entity types, and extracting nested or complex entities. The theme investigates the underutilization of parsing in NER and explores integrating parsing features or parsing-driven modeling to achieve more precise named entity identification.

On the Use of Parsing for Named Entity Recognition

by Miguel Angel Alonso Pardo

2023, Applied Sciences

Key finding: The paper examines how syntactic parsing—both dependency and constituency—can enhance NER by revealing sentence structure cues that identify entity presence and boundaries, e.g., direct objects and nested phrases. It reviews... Read more

articleView Paper downloadDownload

Extraction of Family Relations Between Entities

by Jorge Baptista

2023, inforum.org.pt

Key finding: This study, focusing on Portuguese, showcases a rule-based system for extracting family semantic relations through pattern matching on parsed syntactic structures, using noun phrases, verbs, and prepositional relations to... Read more

articleView Paper downloadDownload

Genealogical Data Mining from Historical Archives: The Case of the Jewish Community in Pisa

by Francesca Valentina Diana and

2023, Informatics

Key finding: The case study uses a semiautomatic pipeline combining digitization, transcription, and NLP (including parsing and rule-based techniques) to extract personal and genealogical entities from archival historical documents.... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Named Entity Extraction

Cooperation of Databases: Are We on the Way to a Single Prosopography of the Ancient World?

by Yanne Broux

2025, Prosopography in the Digital World

This paper gives an overview of the history of prosopographical projects at KU Leuven, starting with the Prosopographia Ptolemaica in the interbellum, its successor Trismegistos People, and Trismegistos' newest feature, the Names in the... more

descriptionView Paper arrow_downwardDownload

Inteligencia Artificial Aplicada a Procesamiento de Lenguaje Natural (NLP) con Python y Machine Learning.

by Editorial Grupo AEA

2025, Inteligencia Artificial Aplicada a Procesamiento de Lenguaje Natural (NLP) con Python y Machine Learning.

Este libro refleja el trabajo realizado bajo investigación entre docentes investigadores con el afán de que sea útil al lector, el uso de predicciones al momento de entrenar un algoritmo clasificado de texto en procesamiento de lenguaje... more

descriptionView Paper arrow_downwardDownload

Challenges in the Alignment, Management and Exploitation of Large and Richly Annotated Multi-Parallel Corpora

by Simon Clematide

2025

The availability of large multi-parallel corpora offers an enormous wealth of material to contrastive corpus linguists, translators and language learners, if we can exploit the data properly. Necessary preparation steps include sentence... more

descriptionView Paper arrow_downwardDownload

Contextual Text Embeddings for Twi

by Clara ASARE-NYARKO

2025, ArXiv

Transformer-based language models have been changing the modern Natural Language Processing (NLP) landscape for high-resource languages such as English, Chinese, Russian, etc. However, this technology does not yet exist for any Ghanaian... more

descriptionView Paper arrow_downwardDownload

Automatic Creation of a Sentence Aligned Sinhala-Tamil Parallel Corpus

by Sandareka Fernando

2025

A sentence aligned parallel corpus is an important prerequisite in statistical machine translation. However, manual creation of such a parallel corpus is time consuming, and requires experts fluent in both languages. Automatic creation of... more

descriptionView Paper arrow_downwardDownload

DES PRÉDICATS À PERTE DE VUE… » (RYLE 1933) : POUR QUOI FAIRE ? ETHNOCENTRISME ET TABOUS

by Alain Lemaréchal

2025, Bulletin de la Société de Linguistique de Paris

Le but de cet article est de présenter de façon synthétique le dernier état du cadre théorique mis en oeuvre dans mes travaux récents. Si certains de ses éléments n'ont pas changé depuis le début, les orientations nouvelles apparues... more

descriptionView Paper arrow_downwardDownload

Study and Experimentation of Gender Bias in Co-reference Resolution

by Felipe Alfaro

2025

Co-reference resolution is an important part of natural language understanding and it's been affected by the current corpora lacking in diversity. This project presents the implementation of two models for masked language modeling... more

descriptionView Paper arrow_downwardDownload

Suggesting Named Entities for Information Access

by Enrique Amigó

2025, Lecture Notes in Computer Science

In interactive searching environments, robust linguistic techniques can provide sophisticated search assistance with a reasonable tolerance to errors, because users can easily select relevant items and dismiss the noisy bits. The general... more

descriptionView Paper arrow_downwardDownload

Detección de humor en tweets en español utilizando clasificadores de Scikit-learn

by Mitzy Sánchez

2025, Res. Comput. Sci.

Resumen. La identificación automática del humor resulta una tarea compleja, ya que lo que provoca el humor aún no está completamente caracterizado. Se han presentado varios enfoques para detectar humor siendo la mayoría en inglés . Esta... more

descriptionView Paper arrow_downwardDownload

How We Did How, What and Why - HOMIO's Participation in QAC4 of NTCIR-6

by Fumito Masui

2025, NTCIR

In our paper we describe our second collective challenge to NTCIR-6 Question Answering Challenge (QAC4). Also this time we decided to investigate the limits of the "as automatic as possible" approach to

descriptionView Paper arrow_downwardDownload

How We Did How, What and WhyHOMIO's Participation in QAC4 of NTCIR-6

by Fumito Masui

2025, sig.media.eng.hokudai.ac.jp

descriptionView Paper arrow_downwardDownload

Event detection based on open information extraction and ontology

by Samir Elloumi

2025, Journal of Information and Telecommunication

descriptionView Paper arrow_downwardDownload

Using the Textual Content of the LMF-Normalized Dictionaries for Identifying and Linking the Syntactic Behaviors to the Meanings

by Imen Elleuch

2025

In this paper we propose an approach for identifying syntactic behaviours related to lexical items and linking them to the meanings. This approach is based on the analysis of the textual content presented in LMF normalized dictionaries by... more

descriptionView Paper arrow_downwardDownload

Text Analysis Tool TWeet lOcator - TAT2

by Rafal Renk

2025

Information about location and geographical coordinates in particular, may be very important during a crisis event, especially for search and rescue operations – but currently geo-tagged tweets are extremely rare. Improved capabilities of... more

descriptionView Paper arrow_downwardDownload

Leveraging Large Language Models for Classification of arXiv Articles

by Nemi B Pelgrom

2025

With a dataset of 1.3 million articles from arXiv, we explore the potential of classifying research papers based solely on their abstracts and titles. We extract abstracts and titles from the arXiv dataset and fine-tune multiple... more

descriptionView Paper arrow_downwardDownload

Named entity recognition using AI-NLP

by IJMTST - International Journal for Modern Trends in Science and Technology (ISSN:2455-3778)

2025, Volume 5, Issue 11

Named Entity Recognition (NER) is a crucial task in Natural Language Processing (NLP), which involves identifying and categorizing named entities in unstructured text data. In recent years, deep learning-based approaches such as Long... more

descriptionView Paper arrow_downwardDownload

Geo-semantic-parsing: AI-powered geoparsing by traversing semantic knowledge graphs

by Maurizio Tesconi

2025, Decision Support Systems

Online social networks convey rich information about geospatial facets of reality. However in most cases, geographic information is not explicit and structured, thus preventing its exploitation in real-time applications. We address this... more

descriptionView Paper arrow_downwardDownload

Geo-semantic-parsing: AI-powered geoparsing by traversing semantic knowledge graphs

by Maurizio Tesconi

2025, Decision Support Systems

descriptionView Paper arrow_downwardDownload

Using Cross-language Information Retrieval for Sentence Alignment

by Nasredine Semmar

2025

Cross-language information retrieval consists in providing a query in one language and searching documents in different languages. Retrieved documents are ordered by the probability of being relevant to the user's request with the highest ranked being considered the most relevant document. The LIC2M cross-language information retrieval system is a weighted Boolean search engine based on a deep linguistic analysis of the query and the documents to be indexed. This system, designed to work on Arabic, Chinese, English, French, German and Spanish, is composed of a multilingual linguistic analyzer, a statistical analyzer, a reformulator, a comparator and a search engine. The multilingual linguistic analyzer includes a morphological analyzer, a part-of-speech tagger and a syntactic analyzer. In the case of Arabic, a clitic stemmer is added to the morphological analyzer to segment the input words into proclitics, simple forms and enclitics. The linguistic analyzer processes both documents to be indexed and queries to produce a set of normalized lemmas, a set of named entities and a set of nominal compounds with their morpho-syntactic tags. The statistical analyzer computes for documents to be indexed concept weights based on concept database frequencies. The comparator computes intersections between queries and documents and provides a relevance weight for each intersection. Before this comparison, the reformulator expands queries during the search. The expansion is used to infer from the original query words other words expressing the same concepts. The expansion can be in the same language or in different languages. The search engine retrieves the ranked, relevant documents from the indexes according to the corresponding reformulated query and then merges the results obtained for each language, taking into account the original words of the query and their weights in order to score the documents. Sentence alignment consists in estimating which sentence or sentences in the source language correspond with which sentence or sentences in a target language. We present in this paper a new approach to aligning sentences from a parallel corpora based on the LIC2M cross-language information retrieval system. This approach consists in building a database of sentences of the target text and considering each sentence of the source text as a "query" to that database. The aligned bilingual parallel corpora can be used as a translation memory in a computer-aided translation tool.

descriptionView Paper arrow_downwardDownload

Proposición de un modelo para la acentuación automática de palabras ambiguas del español, utilizando etiquetado de texto

by Carlos Pérez Corona

2024, Programación matemática y software

La acentuación de palabras cuando se escribe un texto en español es un problema de ambigüedad, debido a que muchas palabras llevan acento o no dependiendo del contexto de la frase. El problema de la ambigüedad está relacionado con la... more

descriptionView Paper arrow_downwardDownload

A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News

by Hamza Chergui

2024, Language Resources and Evaluation

In financial services industry, compliance involves a series of practices and controls in order to meet key regulatory standards which aim to reduce financial risk and crime, e.g. money laundering and financing of terrorism. Faced with... more

descriptionView Paper arrow_downwardDownload

A French Corpus and Annotation Schema for Named Entity Recognition and Relation Extraction of Financial News

by Hamza Chergui

2024

descriptionView Paper arrow_downwardDownload

Clasificación de subjetividad utilizando técnicas de aprendizaje automático

by juan coria

2024

La clasificación de subjetividad es un ámbito de la minería de texto poco estudiado en el idioma español, y sin embargo sus aplicaciones son extensas. Su estudio permite comprender mejor la semántica de un texto y la intención de su... more

descriptionView Paper arrow_downwardDownload

Domain Adaptation in Statistical Machine Translation

by Dimitrios Mavroeidis

2024

Human beings are capable of categorizing a document based on its topic. Computers are already able to perform very well on that task. However, when translating from one language to another, the human translator will use this knowledge to... more

descriptionView Paper arrow_downwardDownload

Evaluation of Information Retrieval and Text Mining Tools on Automatic Named Entity Extraction

by Nishant Kumar

2024, Lecture Notes in Computer Science

We will report evaluation of Automatic Named Entity Extraction feature of IR tools on Dutch, French, and English text. The aim is to analyze the competency of off-the-shelf information extraction tools in recognizing entity types... more

descriptionView Paper arrow_downwardDownload

The Compreno Semantic Model as Integral Framework for Multilingual Lexical Database

by Ekaterina Manicheva

2024

The paper presents an integral framework for multilingual lexical databases (henceforth MLLD) based on Compreno technology. It differs from the existing approaches to MLLD in the following aspects: 1) it is based on a universal semantic... more

descriptionView Paper arrow_downwardDownload

Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text

by Thiên Trúc Nguyễn

2024, Lecture Notes in Computer Science

Named entity recognition (NER) is the process of seeking to locate atomic elements in text into predefined categories such as the names of persons, organizations and locations. Most existing NER systems are based on supervised learning.... more

descriptionView Paper arrow_downwardDownload

SIGNATURE BASED MINING FRAMEWORK FOR EVENT

by Vetrithangam D

2024, JATIT

Temporal event signature mining for knowledge discovery is a difficult problem. In this paper a framework is designed to know a temporal knowledge about the large scales signature mining of longitudinal heterogeneous event data. This... more

descriptionView Paper arrow_downwardDownload

Semantic sky

by Dimitar Trajanov

2024

These days, the number of data sources an ordinary computer user works with every day is very large and continues to grow. With the increasing number of cloud services with specialized functionalities, the users are faced with the... more

descriptionView Paper arrow_downwardDownload

Japanese Question-Answering System Using Decreased Adding with Multiple Answers

by Hitoshi Isahara

2024, NTCIR

We propose a new method of using multiple documents as evidence with decreased adding to improve the performance of a question-answering system. Sometimes, the answer to a question may be found in multiple documents. In such cases, using... more

descriptionView Paper arrow_downwardDownload

The Impact of Indirect Machine Translation on Sentiment Classification

by James Hadley

2024

Sentiment classification has been crucial for many natural language processing (NLP) applications, such as the analysis of movie reviews, tweets, or customer feedback. A sufficiently large amount of data is required to build a robust... more

descriptionView Paper arrow_downwardDownload

An Analysis of Affective Words in Machine Translation

by Maria Aloy

2024

descriptionView Paper arrow_downwardDownload

Evaluation of Information Retrieval and Text Mining Tools on Automatic Named Entity Extraction

by Nishant Kumar

2024, Lecture Notes in Computer Science

descriptionView Paper arrow_downwardDownload

Evaluation of Information Retrieval and Text Mining Tools on Automatic Named Entity Extraction

by Nishant Kumar

2024, Lecture Notes in Computer Science

descriptionView Paper arrow_downwardDownload

Contextual Text Embeddings for Twi

by Samuel Nyarko

2024, ArXiv

descriptionView Paper arrow_downwardDownload

GeoCLEF 2008: The CLEF 2008 Cross-Language Geographic Information Retrieval Track Overview

by Diana Santos

2024, Lecture Notes in Computer Science

GeoCLEF is an evaluation initiative for testing queries with a geographic specification in large set of text documents. GeoCLEF ran a regular track for the third time within the Cross Language Evaluation Forum (CLEF) 2008. The purpose of... more

descriptionView Paper arrow_downwardDownload

An Annotated Corpus for Development of Modern Cadastral Information Systems

by Krzysztof Węcel

2024, Business information systems

Development of m odern Cadastral Information Systems (CIS) requires deployment of tools for automatic estimation of real estates' value which is influenced by a number of factors. After differentiation of the factors, apropriate... more

descriptionView Paper arrow_downwardDownload

Conversational help desk: vague callers and context switch

by Juan Carlos R Huerta

2024, Interspeech 2006

Two salient properties of user behavior make Help Desk a unique speech application different from the more general transactional kind: (a) majority of users have only vague ideas about their problem, and (b) these users are likely to... more

descriptionView Paper arrow_downwardDownload

A framework for large scalable natural language call routing systems

by Juan Carlos R Huerta

2024, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003

A framework is proposed for enterprise automated call routing system development and large scalable natural language call routing application deployment based on IBM's speech recognition and NLU application engagement practices in... more

descriptionView Paper arrow_downwardDownload

Conversational help desk: vague callers and context switch

by Juan Huerta

2024, Interspeech 2006

descriptionView Paper arrow_downwardDownload

A framework for large scalable natural language call routing systems

by Juan Huerta

2024, International Conference on Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003

descriptionView Paper arrow_downwardDownload

Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text

by Thiên triều Nguyễn

2024, Lecture Notes in Computer Science

descriptionView Paper arrow_downwardDownload

Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text

by Huong Le

2024, Lecture Notes in Computer Science

descriptionView Paper arrow_downwardDownload

What Just Happened? A Framework for Social Event Detection and Contextualisation

by Pablo Torres

2024, 2015 48th Hawaii International Conference on System Sciences

In course of a breaking news event, such as natural calamity, political uproar etc., a massive crowd sourced data is generated over social media which makes social media platforms an important source of information in such scenarios. The... more

descriptionView Paper arrow_downwardDownload

Composicionalidad, cómputo de estructura y redes neuronales

by Josep Sopena

2024, Estudios de Psicología

Los problemas que presentan los modelos neuronales de procesamiento del lenguaje y la representación del significado derivan de dos problemas principales: el problema del 'binding' y el problema de la composicionalidad. A su vez estos dos... more

descriptionView Paper arrow_downwardDownload

Feature Based Approach to Named Entity Recognition and Linking for Tweets

by Souvick Ghosh

2024

In this paper, we describe our approach for Named Entity rEcognition and Linking Challenge (NEEL) at the #Microposts2016. The task is to automatically recognize entities and their types from English microposts, and link them to... more

descriptionView Paper arrow_downwardDownload

Using DMoz for constructing ontology from data stream

by Dunja Mladenic

2024, 28th International Conference on Information Technology Interfaces, 2006.

This paper presents an approach for constructing an ontology from a stream of documents. Named entities extracted from the documents are used as instances of the ontology. Entities and co-occurring entity pairs are represented by feature... more

descriptionView Paper arrow_downwardDownload

Use of Multiple Documents as Evidence with Decreased Adding in a Japanese Question-answering System

by Hitoshi Isahara

2024, Shizen gengo shori

We propose a new method of using multiple documents as evidence with decreased adding to improve the performance of question-answering systems.Sometimes,the answer to a question may be found in multiple documents.In such cases,using... more

descriptionView Paper arrow_downwardDownload

Combining Proper Name-Coreference with Conditional Random Fields for Semi-supervised Named Entity Recognition in Vietnamese Text

by Huong Le Thanh

2024, Lecture Notes in Computer Science

descriptionView Paper arrow_downwardDownload

CrisisBERT: a Robust Transformer for Crisis Classification and Contextual Crisis Embedding

by Lucienne Blessing

2024, arXiv (Cornell University)

Classification of crisis events, such as natural disasters, terrorist attacks and pandemics, is a crucial task to create early signals and inform relevant parties for spontaneous actions to reduce overall damage. Despite the crises, such... more

descriptionView Paper arrow_downwardDownload

Named Entity Extraction

Key research themes

1. How can machine learning methods, specifically Hidden Markov Models, be employed and optimized for Named Entity Recognition across diverse languages and domains?

2. What roles do hybrid and deep learning approaches play in improving Named Entity Recognition performance especially in data-scarce or domain-specific contexts?

3. How does syntactic and semantic parsing influence the accuracy and boundary detection in Named Entity Recognition tasks?

Related Topics

All papers in Named Entity Extraction