Key research themes
1. How can speaker verification systems be robustly defended against diverse spoofing attacks including voice conversion, speech synthesis, and replay?
This research area focuses on understanding the vulnerabilities of automatic speaker verification (ASV) systems to a broad range of spoofing attacks, such as voice conversion, speech synthesis, and replay attacks, which pose severe security threats. It also investigates the design and evaluation of anti-spoofing countermeasures, including databases, protocols, and methodologies to detect and mitigate both known and unknown spoofing types, particularly in the context of text-independent ASV systems. The work is significant because spoofing can undermine the reliability of ASV systems deployed in real-world applications such as call centers, banking, and forensic investigations.
2. What techniques improve speaker verification performance and robustness under practical conditions such as limited data, language mismatch, recording channel variability, and multi-speaker environments?
This research theme focuses on enhancing speaker verification accuracy and reliability in realistic and challenging conditions. It includes methods dealing with limited-duration speech segments, channel distortions (e.g., GSM transcoded speech), multilingual and cross-lingual mismatches, and speaker overlap situations. The research addresses acoustic feature design, fusion of complementary feature sets, model adaptation, and joint optimization strategies to maintain verification performance in heterogeneous real-world scenarios.
3. How can speaker verification fairness across demographic and language groups be improved without requiring subgroup labels or creating reliance on balanced data samples?
This research area addresses performance disparities in speaker verification systems arising from imbalanced representation of demographic groups such as gender and nationality, or language variability. The focus is on algorithmic fairness approaches that automatically identify underperforming groups without explicit annotations, using adversarial learning, group-adapted embeddings, fusion networks, and reweighting schemes. This direction is crucial for equitable deployment of speaker verification in diverse real-world populations and for mitigating biases inherent in training data.