Key research themes
1. How can multichannel recordings and spatial modeling improve audio source separation in real-world environments?
This research theme focuses on leveraging multichannel audio data, spatial filtering, and modeling of acoustic environments to improve the separation of overlapping audio sources in natural, reverberant, and complex settings. It is motivated by practical applications such as hearing aids, smart assistants, and telecommunication, where recordings occur outside controlled laboratory conditions. Challenges addressed include moving sources, varying numbers of sources and sensors, reverberation, synchronization, and spatial diffusion of sound sources.
2. What biologically-inspired and deep learning methodologies can enhance sound source segregation and separation in complex acoustic scenes?
This line of research investigates algorithms that mimic human auditory processing and leverage advanced neural network architectures such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and deep ensemble models to perform sound segregation, particularly in challenging scenarios like the Cocktail Party Problem. This theme emphasizes physiological plausibility, feature complementarity, unsupervised/self-supervised learning, and neural architectures tailored for improved separation and robustness to realistic audio mixtures including speech and music sources.
3. How can sound source separation be integrated with sound event detection to improve recognition in noisy and polyphonic environments?
This theme explores the synergy between source separation and sound event detection (SED), particularly for domestic and real-world applications where overlapping events and noise interfere with detection accuracy. It includes joint training frameworks, pre-processing separation to de-mix sounds before event classification, and analytical evaluation of event detection improvements facilitated by separated sources. The approaches contribute to semi-supervised learning, leveraging unlabeled data, and improving interpretability and robustness of SED systems by integrating source separation.