Academia.eduAcademia.edu

Single-Channel Speech Enhancement

description27 papers
group33 followers
lightbulbAbout this topic
Single-Channel Speech Enhancement is a field of signal processing focused on improving the quality and intelligibility of speech signals captured from a single audio channel. It employs various algorithms to reduce background noise, reverberation, and other distortions, thereby enhancing the clarity of the speech for better communication and recognition.
lightbulbAbout this topic
Single-Channel Speech Enhancement is a field of signal processing focused on improving the quality and intelligibility of speech signals captured from a single audio channel. It employs various algorithms to reduce background noise, reverberation, and other distortions, thereby enhancing the clarity of the speech for better communication and recognition.

Key research themes

1. How can deep learning architectures improve estimation of model parameters for enhanced single-channel speech enhancement?

This theme investigates the integration of deep learning with traditional parametric and filtering models (e.g., autoregressive models, Kalman filters) to enhance estimation of speech and noise characteristics in single-channel speech enhancement. This research direction is crucial as accurate estimation of parameters such as linear prediction coefficients (LPCs) directly impacts the quality and intelligibility of enhanced speech, while overcoming limitations of classical methods in noisy, non-stationary environments.

Key finding: This work proposes DeepLPC, a deep learning framework that jointly estimates clean speech and noise linear prediction coefficient (LPC) power spectra without relying on whitening filters, thereby significantly reducing bias... Read more
Key finding: This paper introduces a neural network trained to estimate both the Wiener filter and its associated variance (uncertainty) for spectral coefficients in speech enhancement. The approach models the full posterior distribution... Read more
Key finding: The paper proposes a Multi-Attention Bottleneck (MAB) that integrates a Transformer-based self-attention combined with time-frequency and channel attention modules within a gated convolutional encoder-decoder (CED)... Read more
Key finding: This study applies a supervised deep learning approach based on U-Net architectures to single-channel speech enhancement, using magnitude spectrogram inputs and leveraging convolutional and recurrent layers to model both... Read more

2. What signal representations and masking strategies optimize non-negative matrix factorization (NMF)-based single-channel speech enhancement?

This research area focuses on leveraging specialized signal representations (e.g., wavelet transforms) and jointly learning ratio masking functions with dictionary learning within the NMF framework to improve single-channel speech enhancement. Since traditional STFT-based methods face limitations like time-frequency resolution trade-offs and noisy phase estimation, exploring alternative transforms and mask formulations can yield enhanced noise suppression and better preservation of speech components, which is essential for effective and efficient enhancement.

Key finding: This work introduces a novel speech enhancement approach applying Dual-Tree Complex Wavelet Transform (DTCWT) for shift-invariant subband decomposition, combined with joint dictionary learning of subband smooth ratio masks... Read more
Key finding: The paper proposes Sparse Convolutive Robust Non-negative Matrix Factorization (SCRNMF), an NMF extension that explicitly models non-stationary noise as an interfering source and learns speech features with temporal extent... Read more

3. How do advanced signal-domain transformations and statistical models contribute to improved MMSE estimators in single-channel speech enhancement?

Under this theme, research investigates the impact of adopting alternative signal transforms, like the Discrete Cosine Transform (DCT), and statistical speech priors (e.g., Gaussian, Laplacian, Gamma) to derive closed-form Minimum Mean Square Error (MMSE) estimators for short-time spectral amplitude. By overcoming analytical challenges associated with traditional Discrete Fourier Transform (DFT)-based methods and super-Gaussian priors, these studies aim to optimize noise suppression and speech fidelity, which are critical for both objective and subjective speech enhancement outcomes.

Key finding: This paper derives closed-form MMSE estimators of clean short-time spectral amplitude in the Discrete Cosine Transform (DCT) domain assuming various speech prior distributions including Gaussian, Laplace, and Gamma, under an... Read more

All papers in Single-Channel Speech Enhancement

The problem addressed in this work is that of enhancing speech signals corrupted by additive noise and improving the performance of automatic speech recognizers in noisy conditions. The enhanced speech signals can also improve the... more
Seismic footstep detection based systems can be employed for homeland security applications such as perimeter protection and the border security. This paper reports an approach based on non-negative matrix factorization (NMF) for seismic... more
Super-Gaussian Based Bayesian Estimators plays significant role in noise reduction. However, the traditional Bayesian Estimators process only DFT spectral amplitude of noisy speech and the phase is left unprocessed. While deriving... more
Seismic footstep detection based systems can be employed for homeland security applications such as perimeter protection and the border security. This paper reports an approach based on non-negative matrix factorization (NMF) for seismic... more
Cyber-Physical Systems (CPS) are seen as true technology enablers for new complex industrial applications from the perspective of the Industry 4.0. In this context, the challenges of advanced laser material processing applications present... more
Global System for Mobile Communications (GSM) is one of the most commonly used cellular technologies in the world. One of the objectives in mobile communication systems is the security of the exchanged data. GSM employs many cryptographic... more
The ability to fixate ones eyes on one object while attending to another object is known as covert visual attention. The present study investigated the effects of covert visual attention on reaction time (RT) and accuracy while... more
This paper proposes Discrete Cosine Transform (DCT) based speech enhancement algorithms. These algorithms utilize minimum mean square error (MMSE) estimator of clean short-time spectral amplitude, which respectively uses Gaussian, Laplace... more
This paper proposes Discrete Cosine Transform (DCT) based speech enhancement algorithms. These algorithms utilize minimum mean square error (MMSE) estimator of clean short-time spectral amplitude, which respectively uses Gaussian, Laplace... more
The paper describes design and process of collection, annotation and evaluation of a new Slovak mobile-telephone speech database MobilDat-SK, which is a mobile-telephone extension to the SpeechDat-E SK. The MobilDat-SK database contains... more
This paper proposes an effective approach to model the emotional space of words to infer their Sense Sentiment Similarity (SSS). SSS reflects the distance between the words regarding their senses and underlying sentiments. We propose a... more
We review the advancement of nonstationary time series analysis from the perspective of Cowles Commission structural equation approach. We argue that despite the rich repertoire nonstationary time series analysis provides to analyze how... more
We review the advancement of nonstationary time series analysis from the perspective of Cowles Commission structural equation approach. We argue that despite the rich repertoire nonstationary time series analysis provides to analyze how... more
With the advent and wide dissemination of mobile communications, speech processing systems must be made robust with respect to environmental noise. In fact, the performance of speech coders or speech recognition systems is degraded when... more
In this paper, we extend the pre-image iteration method for speech de-noising by automatic determination of the kernel variance. The kernel variance needs to be adapted in different noise conditions. In previous work, the signal-to-noise... more
A novel way of managing the compromise between noise reduction and speech distortion in Wiener filters is presented. It is based on adjusting the amount of noise reduced, and therefore the speech distortion introduced, on a phone-by-phone... more
A novel way of managing the compromise between noise reduction and speech distortion in Wiener filters is presented. It is based on adjusting the amount of noise reduced, and therefore the speech distortion introduced, on a phone-by-phone... more
This essay locates forensics within national discourse about high-impact practices (HIPs) in higher education, as outlined by scholar George D. Kuh. Forensics shares all the characteristics associated with the ten promising practices Kuh... more
—Noise reduction of speech signals plays an important role in telecommunication systems. Various types of speech additive noise can be introduced such as babble, crowd, large city, and highway which are the main factor of degradation in... more
— The noise exists in almost all environments such as cellular mobile telephone systems. Various types of noise can be introduced such as speech additive noise which is the main factor of degradation in perceived speech quality. At some... more
by Ali Sarafnia and 
1 more
Speech enhancement in real-time applications improves the quality and intelligibility of the speech and reduces communication fatigue. Nowadays, due to reactivity of the systems and spread of online real-time applications, including VoIP,... more
We introduce a non-negative matrix factorization technique which learns speech features with temporal extent in the presence of non-stationary noise. Our proposed technique, namely Sparse convolutive robust non-negative matrix... more
In applications such as target recognition, quantitative use of the information present in synthetic aperture radar (SAR) imagery is pivotal for detecting and classifying the scattering centers of the target(s). This paper presents an... more
The complex-valued image output from a synthetic aperture radar (SAR) processor possesses full spatial resolution defined by the sensor. Typically, this image is either power detected or magnitude detected before it is subjected to... more
Global System for Mobile Communications (GSM) is one of the most commonly used cellular technologies in the world. One of the objectives in mobile communication systems is the security of the exchanged data. GSM employs many cryptographic... more
The GSM voice channel is the world's most widely used mobile communication network. Unfortunately these networks are affected by serious vulnerability from hardware-based attacks and communications can be easy to intercept. This paper... more
The complex-valued image output from a synthetic aperture radar (SAR) processor possesses full spatial resolution defined by the sensor. Typically, this image is either power detected or magnitude detected before it is subjected to... more
Specific Purpose: The specific purpose of this speech is to educate people about marriages and persuade people to help in the fight towards legalizing people of the same sex to be married and not put on contract.
In applications such as target recognition, quantitative use of the information present in synthetic aperture radar (SAR) imagery is pivotal for detecting and classifying the scattering centers of the target(s). This paper presents an... more
Download research papers for free!