Enhancing the Intelligibility of Speech in Speech Noise

Pierre Divenyi

Outline

Title

All Topics

Engineering

Electrical Engineering

Enhancing the Intelligibility of Speech in Speech Noise

Pierre Divenyi

2005

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

chanwoo kim

2011

In this paper we present a new method of signal processing for robust speech recognition using two microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by two microphones through bandpass filtering. We develop a spatial masking function based on normalized cross-correlation, which provides rejection of off-axis interfering signals. To obtain improvements in reverberant environments, a temporal masking component, which is closely related to our previously-described de-reverberation technique known as SSF. We demonstrate that this approach provides substantially better recognition accuracy than conventional binaural sound-source separation algorithms.

downloadDownload free PDF View PDFchevron_right

Two-stage audio-visual speech dereverberation and separation based on models of the interaural spatial cues and spatial covariance

Syed Muhammad Abbas Naqvi

2013 18th International Conference on Digital Signal Processing (DSP), 2013

This work presents a two-stage speech source separation algorithm based on combined models of interaural cues and spatial covariance which utilize knowledge of the locations of the sources estimated through video. In the first pre-processing stage the late reverberant speech components are suppressed by a spectral subtraction rule to dereverberate the observed mixture. In the second stage, the binaural spatial parameters, the interaural phase difference and the interaural level difference, and the spatial covariance are modeled in the short-time Fourier transform (STFT) domain to classify individual timefrequency (TF) units to each source. The parameters of these probabilistic models and the TF regions assigned to each source are updated with the expectation-maximization (EM) algorithm. The algorithm generates TF masks that are used to reconstruct the individual speech sources. Objective results, in terms of the signal-to-distortion ratio (SDR) and the perceptual evaluation of speech quality (PESQ), confirm that the proposed multimodal method with pre-processing is a promising approach for source separation in highly reverberant rooms.

downloadDownload free PDF View PDFchevron_right

The 2010 Signal Separation Evaluation Campaign (SiSEC2010): Audio source separation

Ngoc Duong

2010

This paper introduces the audio part of the 2010 community-based Signal Separation Evaluation Campaign (SiSEC2010). Seven speech and music datasets were contributed, which include datasets recorded in noisy or dynamic environments, in addition to the SiSEC2008 datasets. The source separation problems were split into five tasks, and the results for each task were evaluated using different objective performance criteria. We provide an overview of the audio datasets, tasks and criteria. We also report the results achieved with the submitted systems, and discuss organization strategies for future campaigns.

downloadDownload free PDF View PDFchevron_right

Time--Frequency Masking for Speech Separation and Its Potential for Hearing Aid Design

Mon Linh

Trends in Amplification, 2008

downloadDownload free PDF View PDFchevron_right

Convolutive Blind Signal Separation Spatial Effectiveness in Speech Intelligibility Improvement

Aleksander Sek

Acta Physica Polonica A, 2011

Blind signal separation is one of the latest methods to improve the signal to noise ratio. The main objective of blind source separation is the transformation of mixtures of recorded signals to obtain each source signal at the output of the procedure, assuming that they are statistically independent. For acoustic signals it can be concluded that the correct separation is possible only if the source signals are spatially separated. That finding suggests analogies with the classical spatial filtering (beamforming). In this study we analyzed an effect of the angular separation of two source signals (i.e. speech and babble noise) to improve speech intelligibility. For this purpose, we chose the blind source separation algorithm based on the convolutive separation, based on second order statistics only. As a system of sensors a dummy head was used (one microphone inside each ear canal), which simulated two hearing aids of a hearing impaired person. The speech reception threshold, before and after the blind source separation was determined. The results have shown significant improvement in speech intelligibility after applying blind source separation (speach reception threshold fell even more than a dozen dB) in cases where the source signals were angularly separated. However, in cases where the source signals were coming from the same directions, the improvement was not observed. Moreover, the effectiveness of the blind source separation, to a large extent, depended on the relative positions of signal sources in space.

downloadDownload free PDF View PDFchevron_right

Further experiments on audio-visual speech source separation

David Sodoyer, Christian Jutten

2003

Looking at the speaker's face seems useful to better hear a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audio-visual coherence of speech stimuli. In this paper, we present a set of experiments on a novel algorithm plugging audio-visual coherence estimated by statistical tools, on classical blind source separation algorithms. We show in the case of additive mixtures that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audiovisual coherence enables to focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the "best" sensor in reference to a target source.

downloadDownload free PDF View PDFchevron_right

Developing an audio-visual speech source separation algorithm

Jean-Luc Schwartz

Speech Communication, 2004

Looking at the speakerÕs face is useful to hear better a speech signal and extract it from competing sources before identification. This might result in elaborating new speech enhancement or extraction techniques exploiting the audiovisual coherence of speech stimuli. In this paper, a novel algorithm plugging audio-visual coherence estimated by statistical tools on classical blind source separation algorithms is presented, and its assessment is described. We show, in the case of additive mixtures, that this algorithm performs better than classical blind tools both when there are as many sensors as sources, and when there are less sensors than sources. Audio-visual coherence enables a focus on the speech source to extract. It may also be used at the output of a classical source separation algorithm, to select the ''best'' sensor with reference to a target source.

downloadDownload free PDF View PDFchevron_right

Auditory Component Analysis

Jon Boley

Two of the principle research areas currently being evaluated for the so-called sound source separation problem are Auditory Scene Analysis and a class of statistical analysis techniques known as Independent Component Analysis. This paper presents a methodology for combining these two techniques. It suggests a framework that first separates sounds by analyzing the incoming audio for patterns and synthesizing or filtering them accordingly. It then measures features of the resulting tracks and separates the sounds statistically by matching feature sets and attempting to make the output streams statistically independent. The proposed system is found to successfully separate artificial and acoustic mixes of sounds. As expected, the amount of separation is inversely proportional to the amount of reverberation present, number of sources, and interchannel correlation.

downloadDownload free PDF View PDFchevron_right

PRELIMINARY GUIDELINES FOR SUBJECTIVE EVALUATION OF AUDIO SOURCE SEPARATION ALGORITHMS

Maria Jafari

Evaluating audio source separation algorithms means rating the quality or intelligibility of separated source signals. While objective criteria fail to account for all auditory phenomena so far, precise subjective ratings can be obtained by means of listening tests. In practice, the accuracy and the reproducibility of these tests depend on several design issues. In this paper, we discuss some of these issues based on ongoing research in other areas of audio signal processing. We propose preliminary guidelines to evaluate the basic audio quality of separated sources and provide an example of their application using a free Matlab graphical interface.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

Pierre Divenyi

The human auditory system uses a number of well-identified cues to segregate and separate individual sound sources in a complex acoustical environment. For example, researchers in auditory scene analysis have long identified cues such as common onset, correlated fluctuations in instantaneous amplitude and frequency, harmonicity, and common interaural time and amplitude differences as ways of identifying which components of a complex signal are derived from a common source. It is widely believed that the use of these cues to achieve such "grouping" and signal separation should be very useful in improving the accuracy of automatic speech recognition in very difficult environments such as competing speech, background music, and transient noise, and this has been a goal of several research groups in computational auditory scene analysis. This talk describes and discusses several ways in which signals can be separated using physiologically-motivated cues, along with the potential benefit to be derived from such separation for automatic speech recognition.

downloadDownload free PDF View PDFchevron_right

Signal Separation Motivated by Human Auditory Perception: Applications to Automatic Speech Recognition

Richard Stern

Speech Separation by Humans and Machines

downloadDownload free PDF View PDFchevron_right

Audiovisual Speech Source Separation: An overview of key methodologies

Syed Aqeel Naqvi

IEEE Signal Processing Magazine, 2000

downloadDownload free PDF View PDFchevron_right

A comparison of several computational auditory scene analysis (CASA) techniques for monaural speech segregation

aicha BOUZID

Brain Informatics, 2015

Humans have the ability to easily separate a composed speech and to form perceptual representations of the constituent sources in an acoustic mixture thanks to their ears. Until recently, researchers attempt to build computer models of high-level functions of the auditory system. The problem of the composed speech segregation is still a very challenging problem for these researchers. In our case, we are interested in approaches that are addressed to the monaural speech segregation. For this purpose, we study in this paper the computational auditory scene analysis (CASA) to segregate speech from monaural mixtures. CASA is the reproduction of the source organization achieved by listeners. It is based on two main stages: segmentation and grouping. In this work, we have presented, and compared several studies that have used CASA for speech separation and recognition.

downloadDownload free PDF View PDFchevron_right

A TENTATIVE TYPOLOGY OF AUDIO SOURCE SEPARATION TASKS

Eric le

2003

We propose a preliminary step towards the construction of a global evaluation framework for Blind Audio Source Separation (BASS) algorithms. BASS covers many potential applications that involve a more restricted number of tasks. An algorithm may perform well on some tasks and poorly on others. Various factors affect the difficulty of each task and the criteria that should be used to assess the performance of algorithms that try to address it. Thus a typology of BASS tasks would greatly help the building of an evaluation framework. We describe some typical BASS applications and propose some qualitative criteria to evaluate separation in each case. We then list some of the tasks to be accomplished and present a possible classification scheme.

downloadDownload free PDF View PDFchevron_right

Enhancing the Intelligibility of Speech in Speech Noise

Sign up for access to the world's latest research

Related papers

Related papers

Related topics