Academia.eduAcademia.edu

Name Entity Recognition

description22 papers
group36 followers
lightbulbAbout this topic
Named Entity Recognition (NER) is a subtask of information extraction that involves identifying and classifying key entities in text into predefined categories, such as names of people, organizations, locations, dates, and other specific terms, facilitating the understanding and processing of unstructured data.
lightbulbAbout this topic
Named Entity Recognition (NER) is a subtask of information extraction that involves identifying and classifying key entities in text into predefined categories, such as names of people, organizations, locations, dates, and other specific terms, facilitating the understanding and processing of unstructured data.

Key research themes

1. How can Hidden Markov Models be applied to Named Entity Recognition across languages and domains?

This theme focuses on the use of Hidden Markov Models (HMM) as a statistical sequence labeling method for the identification and classification of named entities (NEs) in text. The research addresses the applicability of HMM to different languages, including low-resource and Indian languages, and the challenges therein, such as lack of capitalization, ambiguity, and resource scarcity. It also evaluates HMM's effectiveness compared to other ML and rule-based methods and explores integration with chunking and feature engineering to improve performance.

Key finding: Demonstrates an HMM-based NER system that is language-independent and adaptable to any domain, emphasizing its utility in Indian languages where absence of capitalization and resource scarcity make conventional methods less... Read more
Key finding: Proposes an advanced HMM-based chunk tagger integrating four types of evidences (internal features like capitalization, semantic triggers, gazetteer features, and external context), achieving F-measures of 96.6% and 94.1% on... Read more
Key finding: Develops an HMM NER system for Bengali and Hindi, demonstrating that despite lack of capitalization and morphological complexity in these languages, the HMM approach achieves robust 10-fold cross-validation F-Scores of 84.5%... Read more
Key finding: Introduces an HMM-based biomedical NER system using only part-of-speech tags as additional features to mitigate class imbalances and enhance boundary detection. Despite minimal domain knowledge incorporation, the system... Read more

2. What are the benefits and limitations of leveraging linguistic parsing and syntactic structure for Named Entity Recognition?

This research area investigates the use of syntactic parsing techniques—both constituency and dependency parsing—to improve the identification and delimitation of named entities. It explores how deep structural information can guide or augment sequence labeling models to resolve ambiguities and better segment complex entities, with a focus on recent advances in parsing technology and their integration in NER pipelines. The discussion includes different parsing-informed approaches and their empirical benefits.

Key finding: Analyzes the incorporation of syntactic parsing information into NER systems, proposing that parsing provides crucial cues not only for detecting entity presence but also their precise span. Demonstrates that both... Read more
Key finding: Employs rule-based semantic relation extraction from parsed Portuguese texts focusing on family relationships. The study highlights how syntactic structures and linguistic patterns help identify relational semantic links... Read more

3. How can domain- and language-specific corpora and annotation methodologies enhance Named Entity Recognition for low-resource and specialized languages?

This theme covers the development of annotated datasets and domain-adapted NER models for low-resource languages (e.g., Bhojpuri, Maithili, Magahi, Odia) and specialized domains (agriculture, biomedical, historical culture). It emphasizes corpus creation methodologies, automatic or semi-automatic annotation tools, lexicon generation, and domain-specific feature engineering. The research underlines the critical role of tailored datasets and linguistic insights for effective NER in underrepresented languages and specialized fields.

Key finding: Presents the first annotated NER dataset for Bhojpuri, Maithili, and Magahi, annotated with 22 entity labels on sizeable corpora (56k-228k tokens). The study highlights the unique challenges of these languages, such as... Read more
Key finding: Introduces an algorithm and tool for automatic annotation of plant protection queries derived from a large agricultural helpline dataset, enabling creation of annotated corpora in the agricultural domain. The tool facilitates... Read more
Key finding: Develops a BERT-BiLSTM-CRF-based model applied to a newly constructed historical and cultural Chinese NER dataset, targeting entities like historical dynasties, figures, times, locations, and official titles. The approach... Read more
Key finding: Implements a multi-level conditional random field (CRF) based NER for Odia language with a hierarchical tag set of 44 attributes. Despite linguistic challenges including absence of capitalization and morphological complexity,... Read more
Key finding: Designs a CRF-based NER system targeting noisy and informal homeopathic forum texts, where named entities are complex and include frequent spelling variations. By leveraging active learning and semi-supervised techniques on a... Read more

All papers in Name Entity Recognition

Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document... more
Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different people, an effective method is needed to measure document... more
recibido: 09 de noviembre de 2013, aceptado: 02 de enero de 2014 RESUMEN. Un Índice de Calidad de Agua (ICA) es una herramienta estadística para estimar la calidad de un cuerpo de agua. El objetivo fue determinar un ICA para la presa La... more
Los avances tecnológicos de los sistemas de cómputo paralelo y distribuido permiten el desarrollo de aplicaciones antes impensadas. Una de nuestras líneas de investigación se enfoca en aplicar estas tecnologías a Unidades de Cuidados... more
Semantic Web technologies like RDF model, URIs and SPARQL query language, can reduce the complexity of data integration by making use of properly established and described links between sources. However, the diculty to formulate... more
Semantic Web technologies like RDF model, URIs and SPARQL query language, can reduce the complexity of data integration by making use of properly established and described links between sources. However, the diculty to formulate... more
dos (Knowledge Discovery in Databases -KDD) envolve a análise de extensas bases de dados e recurso a complexos algoritmos de análise de dados (Data Mining ). Este processo requer, geralmente, recursos computacionais dedicados e de elevado... more
Artículo recibido: 23 de noviembre de 2009, aceptado: 08 de abril de 2013 RESUMEN. Un total de 12 mieles fueron colectadas durante los ciclos de cosecha 2006-2007 en los municipios de Huimanguillo, Cárdenas, Paraíso (región de La... more
Fruits from six selections (S) of mamey sapote were harvested in the Soconusco, Chiapas at a mature stage and ripened at room temperature (23 oC and 70 % R. H.) for 12 d. The fruits from the six selections showed maximum respiration and... more
Bu çalışmanın amacı, bir dokümandaki en önemli cümleleri seçerek ilgili dokümanın özetini çıkarmaktır. Bu amaçla 15 farklı cümle seçim metodu kullanılmıştır. Bu metotlar, 15 kadın ve 15 erkek olmak üzere, toplam 30 kişi tarafından... more
Good Computing Reports (From Charles Hu , "Practical Guidance for Teaching the Social Impact Statement (SIS). From Proceedings of the 1996 Symposium on Computers and the Quality of Life, pp. 86-89. New York, ACM Press.) Key Links 1.... more
ADD elgeri heision higrmF CTL gomputtion ree vogiF CTMC gontinuousEime wrkov ghinF DTMC hisreteEime wrkov ghinF JML tv wodeling vngugeF MDP wrkov heision roessF PCTL roilisti gomputtion ree vogiF PMC rmetri wrkov ghinF PVS rototype... more
Resumo | O turismo fornece às pessoas idosas uma atividade social que melhora a qualidade de vida. Emergiu, assim, um produto que tem ganho grande visibilidade: o turismo de saúde. Apesar da importância, ainda não é possível determinar o... more
Resumo | Este trabalho tem como objetivo analisar o conceito de turismo académico, inserido na esfera do turismo educacional, e aplicar o conceito aos estudantes que realizaram mobilidade internacional ao abrigo do Programa Erasmus na... more
Ao professor Amilton Sinatora pela oportunidade para vir ao Brasil e ter uma inesquecível experiência além do doutorado, pela amizade, apoio, conança e motivação para realizar este trabalho. À FAPESP pela bolsa de doutorado (processo... more
Proyecto de tesis para obtener el grado academico de magister en matematica aplicada. Escuela de posgrado UNMSM, Octubre del 2013.
Phosphate glasses having compositions (59.5x)P2O540MgOxAgCl0.5Er2O3, where x = 0, 1.5 mol.% is prepared using melt-quenching technique. Infrared, absorption and photoluminescence spectra of Er 3+-doped magnesium phosphate glasses have... more
En el contexto de la propagación del coronavirus (Covid-19), a nivel mundial, se han reportado datos sobre el número de casos positivos, fallecidos, hospitalizados, recuperados, etc. En el caso de Perú por el Ministerio de Salud (MINSA),... more
Denna rapport fokuserar på jämförelsen av några olika klassificeringsmetoder applicerade på bilddatan Fashion-MNIST. De olika metoderna är artificiella neurala nätverk och funktionell principalkomponentanalys och principalkomponentanalys.... more
The paper deals with one small step in the process of model driven development (MDD) or model driven architecture (MDA) widely used terms nowadays. MDD denes techniques to develop software systems using variety of models together with a... more
Track etched membranes are porous systems consisting of a thin polymer foil with channels from surface to surface. Latent ion tracks are the result of the passage of swift ions through solid matter and they can be etched selectively. As a... more
Ocimum basilicum belongs to the family Lamiaceae which is known to have anticancer and many other bioactivities. Phoma eupyrena, Emericella nidulans lata and Chaetomium olivaccium were isolated from the different organs of the Basil... more
A ship design with boundary layer alignment device (BLAD) aiming at improving the inow of a propeller is evaluated using a RANSE computation with adaptive grid renement. This paper is focused on model scale simulation for which... more
From 18.04.04 to 23.04.04, the Dagstuhl Seminar 04171 Logic Based Information Agents was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current... more
Agradeço a Deus, primeiramente, por conseguir concluir esta tese sem qualquer prejuízo ao Lucas. Agradeço muito ao Diego pelo companheirismo e pela grande compreensão demonstrada em relação a todos os momentos que estive ausente em... more
A Deus, que sempre esteve com a gente em cada viagem e em cada momento de nossa vida. Muito obrigada Deus por dar fortaleza especialmente nos momentos de solidão. Resumo MAMANI, E. Z. S. Cálculo de reputação em redes sociais a partir da... more
La tecnologia movil ha surgido de la necesidad de las personas de llevar consigo un medio de comunicacion con opciones de entretenimiento, una biblioteca y acceso a Internet. Actualmente, se ha masificado el uso de dispositivos moviles... more
with specic help available everywhere you see the i O symbol. The following versions of software and data (see references i O) were used in the production of this report:
Los avances tecnológicos de los sistemas de cómputo paralelo y distribuido permiten el desarrollo de aplicaciones antes impensadas. Nuestra
Nuestra investigación está centrada en dos líneas. Por un lado, el estudio del consumo energético de los sistemas de Cómputo de Altas Prestaciones (HPC, de High Performance Computing) cuya alta demanda energética tiene serias... more
From 05.05. to 08.05.2009, the Dagstuhl Seminar 09192 ``From Quality of Service to Quality of Experience'' was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their... more
Despite the fact that stemming greatly improves Arabic information retrieval performance, yet no standard stemmer emerges in the field of Arabic IR due to some limitations and shortcomings. Among the recurring problems is that the stemmer... more
Named entity recognition (NER) plays a significant role in many applications such as information extraction, information retrieval, question answering, and even machine translation. Most of the work on NER using deep learning was done for... more
Despite the fact that stemming greatly improves Arabic information retrieval performance, yet no standard stemmer emerges in the field of Arabic IR due to some limitations and shortcomings. Among the recurring problems is that the stemmer... more
Name Entity Recognition (NER) is a process of information extraction that seeks to locate atomic elements in text and classify them into predefined categories such as the name of persons, organizations, locations, expressions of times,... more
Resumen: Este artículo presenta un sistema para el reconocimiento de entidades con nombre apoyándonos en dos técnicas clásicas de aprendizaje automático: los modelos de Markov y losárboles de decisión. Se han desarrollado varios sistemas... more
Rule-based approaches are using human-made rules to extract Named Entities (NEs), it is one of the most famous ways to extract NE as well as Machine Learning. The term Named Entity Recognition (NER) is defined as a task determined to... more
Arabic Natural Language Processing (ANLP) has known an important development during the last decade. Nowadays, several ANLP tools are already developed such as morphological analyzers. These analyzers are often used in more advanced... more
Background: Adverse Drug reactions (ADR) cause a high number of deaths among hospitalized patients in developed countries. Major drug agencies have devoted a great interest in the early detection of ADRs due to their high incidence and... more
Bu çalışmanın amacı, bir dokümandaki en önemli cümleleri seçerek ilgili dokümanın özetini çıkarmaktır. Bu amaçla 15 farklı cümle seçim metodu kullanılmıştır. Bu metotlar, 15 kadın ve 15 erkek olmak üzere, toplam 30 kişi tarafından... more
The Morphological analysis tool recognizes and investigates the structure of the words given internally and provides the syntactical and morphological info related to words given as input. The Kannada language is morphologically much... more
In the era of big data, medical researchers attempt to utilize some analysis techniques like machine learning and text mining on their large-scale corpora to save valuable labor work and time. Consequently, many data analysis platforms... more
Several tools and resources have been developed to deal with Arabic NLP. However, a homogenous and flexible Arabic environment that gathers these components is rarely available. In this perspective, we introduce SAFAR which is a... more
Arabic is the 6th most important language in the world with more than 300 million speakers. Arabic Question Answering systems are gaining great importance due to the increasing amounts of Arabic content on the Internet and the increasing... more
Named entity recognition (NER) plays a significant role in many applications such as information extraction, information retrieval, question answering, and even machine translation. Most of the work on NER using deep learning was done for... more
An information retrieval (IR) system is the core of many applications, including digital library management systems (DLMS). The IR-based DLMS depends on either the title with keywords or content as symbolic strings. In contrast, it... more
Download research papers for free!