Key research themes
1. How can Hidden Markov Models be applied to Named Entity Recognition across languages and domains?
This theme focuses on the use of Hidden Markov Models (HMM) as a statistical sequence labeling method for the identification and classification of named entities (NEs) in text. The research addresses the applicability of HMM to different languages, including low-resource and Indian languages, and the challenges therein, such as lack of capitalization, ambiguity, and resource scarcity. It also evaluates HMM's effectiveness compared to other ML and rule-based methods and explores integration with chunking and feature engineering to improve performance.
2. What are the benefits and limitations of leveraging linguistic parsing and syntactic structure for Named Entity Recognition?
This research area investigates the use of syntactic parsing techniques—both constituency and dependency parsing—to improve the identification and delimitation of named entities. It explores how deep structural information can guide or augment sequence labeling models to resolve ambiguities and better segment complex entities, with a focus on recent advances in parsing technology and their integration in NER pipelines. The discussion includes different parsing-informed approaches and their empirical benefits.
3. How can domain- and language-specific corpora and annotation methodologies enhance Named Entity Recognition for low-resource and specialized languages?
This theme covers the development of annotated datasets and domain-adapted NER models for low-resource languages (e.g., Bhojpuri, Maithili, Magahi, Odia) and specialized domains (agriculture, biomedical, historical culture). It emphasizes corpus creation methodologies, automatic or semi-automatic annotation tools, lexicon generation, and domain-specific feature engineering. The research underlines the critical role of tailored datasets and linguistic insights for effective NER in underrepresented languages and specialized fields.