Université des Sciences et Technologie Houari Boumediene (USTHB)
Télécommunication
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the... more
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the mobile station. Acoustic Echo Canceller (AEC) based on adaptive filtering is an attractive solution. In this work, DTD using discriminative speech feature extraction from the near-end and the microphone speech signals was performed. The main purpose is to discriminate between these signals for sensing Double Talk (DT) periods. To evaluate the performance we use the NLMS algorithm to update the filter coefficients. Results obtained from the TIMIT database show that the performances of the proposed method are significantly improved, compared to the Normalized Cross Correlation (NCC) and Geigel methods.
- by Mahfoud Hamidia and +1
- •
In this paper, a new structure for acoustic echo cancellation is presented. The role of acoustic echo canceller (AEC) is to remove undesirable acoustic echoes in communication systems. However, in double-talk case the performance of the... more
In this paper, a new structure for acoustic echo cancellation is presented. The role of acoustic echo canceller (AEC) is to remove undesirable acoustic echoes in communication systems. However, in double-talk case the performance of the AEC is degraded, thus, a doubletalk detector (DTD) must be used for controlling the AEC. A new structure for AEC using an auxiliary adaptive filter is proposed in this paper. Experimental results using speech signals obtained from the TIMIT database show an improvement in acoustic echo cancellation by the proposed structure compared to the standard structure of adaptive filter controlling by the DTD.
- by Mahfoud Hamidia and +1
- •
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the... more
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the mobile station. Acoustic Echo Canceller (AEC) based on adaptive filtering is an attractive solution. In this work, DTD using discriminative speech feature extraction from the near-end and the microphone speech signals was performed. The main purpose is to discriminate between these signals for sensing Double Talk (DT) periods. To evaluate the performance we use the NLMS algorithm to update the filter coefficients. Results obtained from the TIMIT database show that the performances of the proposed method are significantly improved, compared to the Normalized Cross Correlation (NCC) and Geigel methods.
- by Mahfoud Hamidia and +1
- •
- Adaptive Filters
This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented... more
This paper investigates the contribution of formants and prosodic features such as pitch and energy in Arabic speech recognition under real-life conditions. Our speech recognition system based on Hidden Markov Models (HMMs) is implemented using the HTK Toolkit. The frontend of the system combines features based on conventional Mel-Frequency Cepstral Coefficient (MFFC), prosodic information and formants. The experiments are performed on the ARADIGIT corpus which is a database of Arabic spoken words. The obtained results show that the resulting multivariate feature vectors, in noisy environment, lead to a significant improvement, up to 27%, in word accuracy relative the word accuracy obtained from the state-of-the-art MFCCbased system.
This paper brings an improvement of voice activity detection, based on vector quantization and speech enhancement preprocessing (VQ-VAD) proposed recently, and applied to speaker verification system under noisy environment. VQ-VAD is... more
This paper brings an improvement of voice activity detection, based on vector quantization and speech enhancement preprocessing (VQ-VAD) proposed recently, and applied to speaker verification system under noisy environment. VQ-VAD is based on computing the likelihood ratio on an utterance-by utterance basis from mel-frequency cepstral coefficients that train speech and non-speech models. Whereas the notion of speech and non-speech segments in speech signal is independent of the speaker. For this, a modified VQ-VAD technique is proposed in this paper, by creating two UBM's for speech and non-speech models, trained from a long utterance-independence model. Then, an adaptation of UBM's models to the short utterance of speaker is performed via MAP adaptation, instead of using VQ models. Mel-frequency cepstral coefficient's were also extracted by using the recently proposed asymmetric tapers instead of the traditional Hamming windowing. Using the GMM-UBM as a baseline system for speaker verification, extensive simulation results were done by adding different noise levels to the clean TIMIT database, characterized by its short training and very short testing utterances. The obtained results show the superiority of the proposed GMM-MAP-VAD approach in adverse conditions. Furthermore a drastic reduction in the EER is observed when using asymmetric tapers.
This paper presents an evaluation of speaker verification in mobile communication, where speaker verification (SV) becomes a challenging task for high security purpose. Unfortunately the coupling between the loudspeaker and the microphone... more
This paper presents an evaluation of speaker verification in mobile communication, where speaker verification (SV) becomes a challenging task for high security purpose. Unfortunately the coupling between the loudspeaker and the microphone of mobile devices produces the acoustic echo of the far-end speaker. Acoustic echo canceller (AEC) must be added for reducing this echo. Furthermore in the double-talk scenario when far-end speech is corrupted by near-end speech the performance of AEC based on adaptive filter are degraded. In this work various measures are taken to demonstrate the impact of AEC with and without double-talk detector (DTD) on the SV task using ARADIGIT corpus.
- by Mahfoud Hamidia and +1
- •
In this paper subband speech techniques have been proposed for robust speaker verification, where full-band power spectra are divided into 7-subbands. Then cepstral vectors, which are presented by MFCC, Delta and Delta-Delta coefficients... more
In this paper subband speech techniques have been proposed for robust speaker verification, where full-band power spectra are divided into 7-subbands. Then cepstral vectors, which are presented by MFCC, Delta and Delta-Delta coefficients plus energy parameter extracted from TIMIT corpus, of each subband are merged depending on their reliability by using majority vote approach. Specifically, we investigate the performance of speaker verification based on subband approach in noisy conditions using GMM/SVM model. From the results that achieved, we find that, subband processing fusion outperforms traditional wideband techniques in both environments (clean and noisy).
Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker verification. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a... more
Gaussian mixture models (GMMs) have proven extremely successful for text-independent speaker verification. The standard training method for GMM models is to use MAP adaptation of the means of the mixture components based on speech from a target speaker. In this work we look into the various models (GMM-UBM and GMM-SVM) and their application to speaker verification. In this paper, features vectors, constituted by the Mel Frequency Cepstral Coefficients (MFCC) extracted from the speech signal are used to train the Gaussian mixture model (GMM) and mean vectors issued from GMM-UBM to train SVM. To fit the data around their average the cepstral mean subtraction (CMS) are applied on the MFCC. For both, GMM-UBM and GMM-SVM systems, 2048-mixture UBM is used. The verification phase was tested with Aurora database at different Signal-to-Noise Ratio (SNR) and under three noisy conditions. The experimental results showed the outperformance of GMM-SVM against GMM-UBM in speaker verification espe...
Speech recognition systems are gaining increasing importance with the widespread use of mobile and portable devices and other interactive voice response systems. Because of the resource constraints on such devices and the requirements of... more
Speech recognition systems are gaining increasing importance with the widespread use of mobile and portable devices and other interactive voice response systems. Because of the resource constraints on such devices and the requirements of specific applications, the need to perform speech recognition over a data network becomes inevitable. The requirements of such a system with a human at one end and a machine at the other end are clearly asymmetric. For that, we will investigate, in this paper, the use of the Perceptual Linear Predictive (PLP) features for speaker recognition over Internet Protocol (IP) network. For that, we have implemented client-server architecture. Where the frond-end is located in the client side and the recognition system is located in the server side for speaker recognition in a text-independent mode based on Gaussian Mixture Models (GMM). The ARADIGIT corpus was used in the experiments and results based on 60 speakers were promising.
- by Mahfoud Hamidia and +2
- •
This paper deals with the effect of transcoded speech over GSM (Global System for Mobile) on Acoustic Echo Cancellation (AEC) system. In order to reduce the unexpected acoustic echo, cancellation techniques became very helpful in mobile... more
This paper deals with the effect of transcoded speech over GSM (Global System for Mobile) on Acoustic Echo Cancellation (AEC) system. In order to reduce the unexpected acoustic echo, cancellation techniques became very helpful in mobile communication. Acoustic echo is mainly due to the coupling between the loud-speaker and the microphone of MS (Mobile Station). The AEC system is based on adaptive filtering. In other hand, AMR-WB (Adaptive Multi-Rate Wide Band) speech codec is used to encode and decode the speech. It is standardized in the second generation (2G) and third generation (3G) cellular systems. In our work, the coding speech passed through a transmission channel which is modeled by BSC (Binary Symmetric Channel). The simulation results show the degradation of AEC system performance introduced by the AMR-WB speech codec and transmission channel
- by Mahfoud Hamidia and +1
- •
This paper presents a new structure of an acoustic echo suppressor, when acoustic echo cancellation in a mobile communication is investigated. In fact, the near end speech is corrupted by the presence of acoustic echo issued from the... more
This paper presents a new structure of an acoustic echo suppressor, when acoustic echo cancellation in a mobile communication is investigated. In fact, the near end speech is corrupted by the presence of acoustic echo issued from the far-end speaker (double-talk). A classical Acoustic Echo Canceller (AEC) is not sufficient. The performance of classical AEC is improved by Double Talk Detection (DTD) and Noise Reduction (NR). The proposed structure of acoustic echo suppressor presents better performance than that the AEC controlled by DTD.
- by Mahfoud Hamidia and +1
- •
This paper deals with the use of Automatic Speaker Recognition (ASR) in Local Area Network (LAN), in the presence of noise. In this work, focused on Distributed Speaker Recognition (DSR), we introduce the client/server architecture, where... more
This paper deals with the use of Automatic Speaker Recognition (ASR) in Local Area Network (LAN), in the presence of noise. In this work, focused on Distributed Speaker Recognition (DSR), we introduce the client/server architecture, where the client is the front-end of the ETSI Standard Aurora, and the recognition system is located to remote server. For speaker recognition task, achieved in in a text-independent mode, Gaussian Mixture Models (GMM) have been used with the ARADIGIT corpus. Experimental results show that the client server architecture using User Datagram Protocol (UDP) is an appropriate way to realise DSR.
- by Mahfoud Hamidia and +1
- •
An important step in speaker verification is extracting features that best characterize the speaker voice. This paper investigates a front-end processing that aims at improving the performance of speaker verification based on the SVMs... more
An important step in speaker verification is extracting features that best characterize the speaker voice. This paper investigates a front-end processing that aims at improving the performance of speaker verification based on the SVMs classifier, in text independent mode. This approach combines features based on conventional Mel-cepstral Coefficients (MFCCs) and Line Spectral Frequencies (LSFs) to constitute robust multivariate feature vectors. To reduce the high dimensionality required for training these feature vectors, we use a dimension reduction method called principal component analysis (PCA). In order to evaluate the robustness of these systems, different noisy environments have been used. The obtained results using TIMIT database showed that, using the paradigm that combines these spectral cues leads to a significant improvement in verification accuracy, especially with PCA reduction for low signal-to-noise ratio noisy environment.
The aim of this study is to perform an Arabic word recognition system, focused to a small vocabulary. Various models using neural network approach have been used in ASR. In order to increase the efficiency of the classification task we... more
The aim of this study is to perform an Arabic word recognition system, focused to a small vocabulary. Various models using neural network approach have been used in ASR. In order to increase the efficiency of the classification task we propose the use of a nonparametric density estimator. Thus, in this paper we present an adaptation scheme for independent speaker Arabic speech recognition based on the General Regression Neural Network (GRNN). In another hand we have also implemented a left-right Hidden Markov Model (DHMM) with five states and relative performances of the two proposed applications are compared to the popular known MLP. Experimental results obtained with large corpora have shown that the use of a nonparametric density estimator with an appropriate smooth factor improves the generalization power of neural network.
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the... more
This paper presents a new method of the Double Talk Detection (DTD) for acoustic echo cancellation. The main goal is to remove the undesirable acoustic echoes produced by the coupling between the loudspeaker and the microphone of the mobile station. Acoustic Echo Canceller (AEC) based on adaptive filtering is an attractive solution. In this work, DTD using discriminative speech feature extraction from the near-end and the microphone speech signals was performed. The main purpose is to discriminate between these signals for sensing Double Talk (DT) periods. To evaluate the performance we use the NLMS algorithm to update the filter coefficients. Results obtained from the TIMIT database show that the performances of the proposed method are significantly improved, compared to the Normalized Cross Correlation (NCC) and Geigel methods.
This paper provides an overview of low-level features for speaker recognition, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on); which has proven high noise robustness in the... more
This paper provides an overview of low-level features for speaker recognition, with an emphasis on the recently proposed MFCC variant based on asymmetric tapers (MFCC asymmetric from now on); which has proven high noise robustness in the context of speaker verification. Using the TIMIT corpus the performance of the MFCC-asymmetric is compared with: the standard Mel-Frequency Cepstral Coefficients (MFCC) and The Linear Frequency Cepstral Coefficients (LFCC) under clean and noisy environments. To simulate real world conditions, the verification phase was tested with two noises (babble and factory) at different Signal-to-Noise Ratios (SNR) issued from NOISEX-92 database. The experimental results showed that MFCCs-asymmetric tapers (k=4) outperform other features in noisy condition. Finally, we have investigated the impact of consolidating evidences from different features by score level fusion. Preliminary results show promising improvement on verification rate with score fusion. ImprovingthePerformanceofSpeakerVerificationSystemsunderNoisyConditionsusingLowLevelFeaturesandScore LevelFusion 35 ImprovingthePerformanceofSpeakerVerificationSystemsunderNoisyConditionsusingLowLevelFeaturesandScore LevelFusion 37
Biometric system has been actively emerging in various industries for the past few years, and it is continuing to roll to provide higher security features for access control system. Many types of unimodal biometric systems have been... more
Biometric system has been actively emerging in various industries for the past few years, and it is continuing to roll to provide higher security features for access control system. Many types of unimodal biometric systems have been developed. However, these systems are only capable to provide low to middle range of security feature. Thus, for higher security feature, the combination of two or more unimodal biometrics (multiple modalities) is required. In this paper, we propose a multimodal biometric system for person recognition using hand images and by integrating two different modalities palmprint and Finger-Knuckle-Print (FKP). Addressing this problem we propose an efficient matching algorithm based on Phase-Correlation Function (PCF) and using the two biometric modalities the palmprint and the FKP. The two modalities are combined and the fusion is applied at the matching-score level. The experimental results showed that the designed system achieves an excellent recognition rate and provide more security than unimodal biometric-based system.
- by S. Chitroub and +1
- •
Reliability and accuracy in personal identification system is a dominant concern to the security world. Biometric has gained much attention in this subject recently. Many types of personal identification systems have been developed, and... more
Reliability and accuracy in personal identification system is a dominant concern to the security world. Biometric has gained much attention in this subject recently. Many types of personal identification systems have been developed, and palmprint identification is one of the emerging technologies. This paper presents a novel biometric technique to automatic personal identification system using multispectral palmprint technology. In this method, each of spectrum images are aligned and then used to extract palmprint features using 1D log-Gabor filter. These features are then examined for their individual and combined performances. Finally, the hamming distance is used for matching of palmprint features. The experimental results showed that the proposed method achieve an excellent identification rate and provide more security.