Key research themes
1. How can speech synthesis systems be adapted to support both speech and singing voice production from neutral speech corpora?
This research area focuses on developing text-to-speech (TTS) frameworks that extend beyond conventional speech synthesis to incorporate singing voice production without requiring dedicated singing databases. The motivation lies in the cost, feasibility, and flexibility challenges of recording supplementary singing corpora, especially when the original speaker is unavailable or unable to sing well. The key insight is integrating speech-to-singing (STS) conversion within unit selection or corpus-based TTS systems using neutral speech databases, enabling synthesis of expressive vocal outputs for applications like storytelling, assistive devices, and immersive experiences.
2. What computational and vocal models facilitate the control and learning of expressive vocal intonation and prosody, including for language learning and voice training?
This theme explores computational synthesis techniques and interactive training methods designed to improve vocal expressiveness, particularly intonation patterns and prosodic features critical for natural speech and singing. The focus includes how speech synthesis models are manipulated for expressive control and how novel interfaces support second language (L2) speakers in mastering challenging intonation, as well as models for professional voice training to optimize vocal and prosodic quality. These approaches provide actionable methods for enhancing voice performance through controlled vocal synthesis and targeted training.
3. How can physical and computational models of vocal fold physiology and acoustics improve understanding and simulation of voice production?
This theme surveys synthetic vocal fold models and numerical approaches that accurately represent the biomechanics and aerodynamics of phonation to better simulate human voice production. It includes the design of self-oscillating vocal fold models, quantification of vocal fold geometry, and stabilized finite element methods for wave equations in moving vocal tracts. These advances help elucidate the complex coupling of tissue vibration, airflow, and acoustics, yielding insights for synthesis, voice therapy, and model-based voice production research.