Key research themes
1. How can semantic and concept-based methods improve text indexing compared to traditional keyword-based approaches?
This research area focuses on enhancing the indexing and retrieval of text documents by moving beyond simple keyword matching to semantic-aware methods. These approaches leverage linguistic resources, concept identification, and semantic similarity measures to better capture the inherent meaning and context in documents. The goal is to address challenges posed by synonymy, polysemy, and lexical ambiguities that limit keyword-based indexing.
2. What indexing structures and algorithms enable efficient document retrieval at scale, especially using inverted and cluster-based indexes?
This theme addresses the design, optimization, and application of indexing data structures such as inverted indexes and clustering-enhanced variants to enable fast and scalable document retrieval. The focus lies on supporting varied query types including word-based, substring, and complex queries over large text corpora, while balancing time and space efficiency. Emerging data structures like wavelet trees and clustering algorithms improve indexing precision and retrieval speed.
3. How can indexing approaches handle uncertainty and variability in texts, such as weighted sequences or approximate matching?
The focus here is on indexing methods that accommodate uncertain, imprecise, or approximate text representations. This includes weighted sequences where each position represents probabilistic letter distributions, and approximate dictionary matching where exact matches are relaxed to allow errors or mismatches. Such methods must balance indexing size, preprocessing time, and query performance especially in applications like bioinformatics and noisy data retrieval.