Key research themes
1. How can schema-agnostic and scalable blocking techniques improve entity resolution on heterogeneous and noisy big data?
This research theme focuses on developing blocking methods that do not require prior schema knowledge and can efficiently handle large, heterogeneous, and noisy datasets. Blocking is a critical step in entity resolution (ER) that partitions datasets into smaller blocks to reduce the quadratic comparison cost. Addressing schema heterogeneity and noise while maintaining blocking effectiveness and scalability is essential for processing Big Data ER tasks.
2. What are effective distributed and clustering-based methods for scalable multi-source entity resolution?
This theme investigates methods that use distributed computing frameworks and clustering algorithms to tackle ER involving multiple heterogeneous data sources. By focusing on clustering to group matching entities across many datasets and exploiting parallel processing platforms like Apache Flink and Apache Spark, these approaches aim to improve scalability and integration quality in multi-source ER scenarios.
3. How can multi-type or graph-based entity representations improve unsupervised entity resolution and disambiguation?
This research area explores leveraging graph structures and multi-type entity models for unsupervised entity resolution and named entity disambiguation. Techniques focus on jointly resolving entities of different types by summarizing multi-typed RDF graphs, exploiting relational context, and applying graph-based semantic relatedness for disambiguating entities without relying on supervised learning.