Key research themes
1. How can scalable and efficient algorithms address large-scale multilingual record linkage and load balancing?
This research area investigates methods to improve the scalability and efficiency of record linkage processes, especially in contexts involving large datasets with records in multiple languages. It focuses on algorithmic solutions that balance computational loads while maintaining high accuracy in matching records with language and script variations.
2. What frameworks and methodologies improve data deduplication and entity integration across heterogeneous data sources?
This theme investigates conceptual frameworks, practical tools, and methodologies for deduplication and entity resolution across multiple heterogeneous data sources. It emphasizes methods combining blocking, record linkage, and human-in-the-loop strategies to improve data quality in domains with complex, diverse inputs and the integration challenges associated with large-scale or domain-specific datasets.
3. How can privacy-preserving methods enable secure and ethical record linkage of sensitive genomic and clinical datasets?
This theme focuses on the ethical, legal, and technological challenges in linking sensitive data sets such as genomic and clinical records, with the dual goals of enabling data-driven health research and preserving participants' privacy. It explores privacy-preserving linkage (PPRL) approaches that allow record matching without direct identity disclosure and policy frameworks for responsible data sharing.