Key research themes
1. How can ontology- and taxonomy-based models enhance semantic similarity judgment in structured lexical databases?
This research theme focuses on leveraging structured lexical knowledge bases such as WordNet and domain-specific ontologies to assess semantic similarity. It examines how taxonomic relationships (e.g., hypernym/hyponym, meronym/holonym), information content, and edge-weighted path lengths within ontological hierarchies can quantify semantic proximity more closely aligned with human judgment. This is important because it grounds semantic similarity in well-curated resources and formalizes semantic distance metrics, facilitating applications in information retrieval, word sense disambiguation, and semantic search where lexical resources exist.
2. What role do distributional and corpus-based models play in capturing semantic similarity, particularly for weakly related or dissimilar concepts?
This theme explores the use of statistical and distributional semantics, including semantic networks derived from co-occurrence information, latent semantic analysis, and lexico-syntactic pattern mining, to capture semantic similarity across words and texts. It is particularly concerned with modeling similarity judgments for weakly related or even dissimilar concepts, where knowledge-based resources offer limited coverage. Understanding these patterns is key for modeling human-like similarity judgments, expanding semantic coverage beyond taxonomic relations, and enhancing tasks such as semantic priming, episodic memory modeling, and robust semantic vector space embeddings.
3. How can semantic similarity be operationalized and evaluated in applied NLP tasks involving human interpretability and textual entailment?
This theme addresses approaches combining semantic similarity measures with supervised learning and task evaluation to maximize interpretability and to support downstream applications such as semantic textual similarity (STS), textual entailment, and semantic search. It covers feature-rich models integrating lexical, syntactic, and semantic similarity metrics, and discusses the design and assessment of tests and benchmarks to evaluate how well computational similarity mimics human judgments, including the development of datasets and test tasks focusing on human interpretability and graded semantic equivalence.