Key research themes
1. How can deep learning architectures improve estimation of model parameters for enhanced single-channel speech enhancement?
This theme investigates the integration of deep learning with traditional parametric and filtering models (e.g., autoregressive models, Kalman filters) to enhance estimation of speech and noise characteristics in single-channel speech enhancement. This research direction is crucial as accurate estimation of parameters such as linear prediction coefficients (LPCs) directly impacts the quality and intelligibility of enhanced speech, while overcoming limitations of classical methods in noisy, non-stationary environments.
2. What signal representations and masking strategies optimize non-negative matrix factorization (NMF)-based single-channel speech enhancement?
This research area focuses on leveraging specialized signal representations (e.g., wavelet transforms) and jointly learning ratio masking functions with dictionary learning within the NMF framework to improve single-channel speech enhancement. Since traditional STFT-based methods face limitations like time-frequency resolution trade-offs and noisy phase estimation, exploring alternative transforms and mask formulations can yield enhanced noise suppression and better preservation of speech components, which is essential for effective and efficient enhancement.
3. How do advanced signal-domain transformations and statistical models contribute to improved MMSE estimators in single-channel speech enhancement?
Under this theme, research investigates the impact of adopting alternative signal transforms, like the Discrete Cosine Transform (DCT), and statistical speech priors (e.g., Gaussian, Laplacian, Gamma) to derive closed-form Minimum Mean Square Error (MMSE) estimators for short-time spectral amplitude. By overcoming analytical challenges associated with traditional Discrete Fourier Transform (DFT)-based methods and super-Gaussian priors, these studies aim to optimize noise suppression and speech fidelity, which are critical for both objective and subjective speech enhancement outcomes.