Academia.eduAcademia.edu

Outline

Accurate marginalization range for missing data recognition

2007, Interspeech 2007

https://doi.org/10.21437/INTERSPEECH.2007-331

Abstract

Missing data recognition has been proposed to increase noise robustness of automatic speech recognition. This strategy relies on the use of a spectrographic mask that gives information about the true clean speech energy of a corrupted signal. This information is then used to refine the data process during the decoding step. We propose in this work a new mask that provides more information about the clean speech contribution than classical masks based on a Signal to Noise Ratio (SNR) thresholding. The proposed mask is described and compared to another missing data approach based on SNR thresholding. Experimental results show a significant word error rate reduction induced by the proposed approach. Moreover, the proposed mask outperforms the ETSI advanced front-end on the HIWIRE corpus.

References (8)

  1. References
  2. C. Cerisara, S. Demange, and J-P. Haton, "On noise masking for automatic missing data speech recognition: a survey and discussion," Computer Speech and Language, vol. 21, no. 3, pp. 443-457, July 2007.
  3. B. Raj, Reconstruction of incomplete spectrograms for ro- bust speech recognition, Ph.D. thesis, Carnegie Mellon Uni- versity, 2000.
  4. A. Morris, "Data utility modelling for mismatch reduction," in Proc. CRAC (workshop on Consistent & Reliable Acoustic Cues for sound analysis), Aalborg, Denmark, 2001.
  5. J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust automatic speech recognition," in Proc. ICSLP, Beijing, China, 2000.
  6. S. Demange, C. Cerisara, and J-P. Haton, "Missing data mask models with global frequency and temporal con- straints," in Proc. ICSLP, Pittsburgh,Pennsylvania/USA, September 2006.
  7. A. Potamianos, G. Bouselmi, D. Dimitriadis D. Fohr, R. Gemello, F. Illina, P. Maragos, M. Matassoni, V. Pit- sikalis, J. Ramirez, E. Sanchez-Soto, J.C. Segura, and P. Swaizer, "Towards speaker and environmental robustness in asr: The hiwire project," in Proc, Workshop on Speech Recognition and intrinsic Variation, Toulouse,France, May 2006.
  8. N. Parihar and J. Picone, "Analysis of the aurora large vo- cabulary evaluations," in Proc. EUROSPEECH, Geneva, Switzerland, September 2003, vol. 4, pp. 337-340.