Key research themes
1. How can machine learning models improve spoken language and speaker classification accuracy under resource constraints and domain variability?
This research area focuses on leveraging machine learning (ML) techniques, including supervised and unsupervised learning, factorized convolutional neural networks, domain adversarial training, and self-supervised learning, to enhance the classification of speech units, spoken languages, voice disorders, and speakers. It addresses challenges related to limited training data, domain mismatch between clinical and real-world data, computational constraints for embedded systems, and variability across speakers and recording conditions. Developing robust, compact, and domain-invariant features is vital to deploying accurate classification systems in practical applications.
2. What acoustic and feature extraction techniques best support speech unit and emotion classification despite speech variability and noise?
This theme explores advanced feature extraction methodologies—such as Mel-frequency cepstral coefficients (MFCC), wavelet packet subband analysis, spectral centroid irregularities, and cepstral-based representations—for robust classification of speech units, speech under stress, and emotions. It investigates how these features capture nuanced spectral-temporal dynamics and energy distribution in speech that are crucial for differentiating phonemes, stressed speech types, or emotional states, particularly in noisy or real-world environments where variability is high.
3. How can automatic classification support medical diagnosis and content management through speech analysis?
This research theme studies the application of automatic speech classification for medical diagnosis of neurodegenerative diseases and voice disorders, and for classification of audio content like call routing or speaker diarization in large datasets. It emphasizes extracting linguistic and acoustic markers from spontaneous speech or pathological voices for early detection of diseases like Alzheimer's and voice pathology, as well as efficient indexing and improved management of large-scale speech or broadcast audio by classifying speaker turns, languages, or call reasons.