Enhancing the Intelligibility of Speech in Speech Noise
2005
Sign up for access to the world's latest research
Related papers
2011
In this paper we present a new method of signal processing for robust speech recognition using two microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by two microphones through bandpass filtering. We develop a spatial masking function based on normalized cross-correlation, which provides rejection of off-axis interfering signals. To obtain improvements in reverberant environments, a temporal masking component, which is closely related to our previously-described de-reverberation technique known as SSF. We demonstrate that this approach provides substantially better recognition accuracy than conventional binaural sound-source separation algorithms.
2013 18th International Conference on Digital Signal Processing (DSP), 2013
This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual timefrequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.
2010
This paper introduces the audio part of the 2010 community-based Signal Separation Evaluation Campaign (SiSEC2010). Seven speech and music datasets were contributed, which include datasets recorded in noisy or dynamic environments, in addition to the SiSEC2008 datasets. The source separation problems were split into five tasks, and the results for each task were evaluated using different objective performance criteria. We provide an overview of the audio datasets, tasks and criteria. We also report the results achieved with the submitted systems, and discuss organization strategies for future campaigns.
Trends in Amplification, 2008
Acta Physica Polonica A, 2011
Blind signal separation is one of the latest methods to improve the signal to noise ratio. The main objective of blind source separation is the transformation of mixtures of recorded signals to obtain each source signal at the output of the procedure, assuming that they are statistically independent. For acoustic signals it can be concluded that the correct separation is possible only if the source signals are spatially separated. That finding suggests analogies with the classical spatial filtering (beamforming). In this study we analyzed an effect of the angular separation of two source signals (i.e. speech and babble noise) to improve speech intelligibility. For this purpose, we chose the blind source separation algorithm based on the convolutive separation, based on second order statistics only. As a system of sensors a dummy head was used (one microphone inside each ear canal), which simulated two hearing aids of a hearing impaired person. The speech reception threshold, before and after the blind source separation was determined. The results have shown significant improvement in speech intelligibility after applying blind source separation (speach reception threshold fell even more than a dozen dB) in cases where the source signals were angularly separated. However, in cases where the source signals were coming from the same directions, the improvement was not observed. Moreover, the effectiveness of the blind source separation, to a large extent, depended on the relative positions of signal sources in space.
2003
Looking at the speaker's face seems useful to better hear a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, we present a set of experiments on a novel algorithm plugging audio-visual coherence estimated by statistical tools, on classical blind source separation algorithms. We show in the case of additive mixtures that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audiovisual coherence enables to focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor in reference to a target source.
Speech Communication, 2004
Looking at the speakerÕs face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the ''best'' sensor with reference to a target source.
Two of the principle research areas currently being evaluated for the so-called sound source separation problem are Auditory Scene Analysis and a class of statistical analysis techniques known as Independent Component Analysis. This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly. It then measures features of the resulting tracks and separates the sounds statistically by matching feature sets and attempting to make the output streams statistically independent. The proposed system is found to successfully separate artificial and acoustic mixes of sounds. As expected, the amount of separation is inversely proportional to the amount of reverberation present, number of sources, and interchannel correlation.
Evaluating audio source separation algorithms means rating the quality or intelligibility of separated source signals. While objective criteria fail to account for all auditory phenomena so far, precise subjective ratings can be obtained by means of listening tests. In practice, the accuracy and the reproducibility of these tests depend on several design issues. In this paper, we discuss some of these issues based on ongoing research in other areas of audio signal processing. We propose preliminary guidelines to evaluate the basic audio quality of separated sources and provide an example of their application using a free Matlab graphical interface.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.