Accurate marginalization range for missing data recognition
2007, Interspeech 2007
https://doi.org/10.21437/INTERSPEECH.2007-331Abstract
Missing data recognition has been proposed to increase noise robustness of automatic speech recognition. This strategy relies on the use of a spectrographic mask that gives information about the true clean speech energy of a corrupted signal. This information is then used to refine the data process during the decoding step. We propose in this work a new mask that provides more information about the clean speech contribution than classical masks based on a Signal to Noise Ratio (SNR) thresholding. The proposed mask is described and compared to another missing data approach based on SNR thresholding. Experimental results show a significant word error rate reduction induced by the proposed approach. Moreover, the proposed mask outperforms the ETSI advanced front-end on the HIWIRE corpus.
References (8)
- References
- C. Cerisara, S. Demange, and J-P. Haton, "On noise masking for automatic missing data speech recognition: a survey and discussion," Computer Speech and Language, vol. 21, no. 3, pp. 443-457, July 2007.
- B. Raj, Reconstruction of incomplete spectrograms for ro- bust speech recognition, Ph.D. thesis, Carnegie Mellon Uni- versity, 2000.
- A. Morris, "Data utility modelling for mismatch reduction," in Proc. CRAC (workshop on Consistent & Reliable Acoustic Cues for sound analysis), Aalborg, Denmark, 2001.
- J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust automatic speech recognition," in Proc. ICSLP, Beijing, China, 2000.
- S. Demange, C. Cerisara, and J-P. Haton, "Missing data mask models with global frequency and temporal con- straints," in Proc. ICSLP, Pittsburgh,Pennsylvania/USA, September 2006.
- A. Potamianos, G. Bouselmi, D. Dimitriadis D. Fohr, R. Gemello, F. Illina, P. Maragos, M. Matassoni, V. Pit- sikalis, J. Ramirez, E. Sanchez-Soto, J.C. Segura, and P. Swaizer, "Towards speaker and environmental robustness in asr: The hiwire project," in Proc, Workshop on Speech Recognition and intrinsic Variation, Toulouse,France, May 2006.
- N. Parihar and J. Picone, "Analysis of the aurora large vo- cabulary evaluations," in Proc. EUROSPEECH, Geneva, Switzerland, September 2003, vol. 4, pp. 337-340.