Academia.eduAcademia.edu

Outline

Reconstructing Neutral Speech from Tracheoesophageal Speech

2018, Interspeech 2018

https://doi.org/10.21437/INTERSPEECH.2018-1907

Abstract

In this work, we propose a tracheoesophageal (TE) speech to neutral speech conversion system using data collected from a laryngectomee. In laryngectomees, in the absence of vocal folds, it is the vibration of the esophagus that gives rise to a low frequency pitch during speech production. This pitch is manifested as impulse-like noise in the recorded speech. We propose a method to first 'whisperize' the TE speech prior to the linear predictive coding (LPC) based synthesis which uses pitch derived from the energy contour. In order to perform 'whisperization', we model the LPC residual signal as the sum of white noise and impulses introduced by the esophageal vibrations. We model these impulses and white noise using Bernoulli-Gaussian distribution and Gaussian distribution, respectively. The strength and location of the impulses are estimated using Gibbs sampling in order to remove the impulse-like noise from speech to obtain whispered speech. Subjective evaluation via listening test reveals that the 'whisperization' step in the proposed method aids in synthesizing a more natural sounding neutral speech. A different listening test shows that the listeners prefer the synthesized speech from the proposed method ∼ 93% (absolute) times more than the best baseline scheme.

References (24)

  1. References
  2. J. Fagan, "Open access atlas of otolaryngology, head & neck operative surgery," University of Cape Town. [Online]. Available: https://vula.uct.ac.za/access/content/user/01372298/Total %20la- ryngectomy.pdf
  3. M. K. El-Sharnobya, E. A. Behairya, A. A. Abdel-Fattah, M. A. Al-Belkasy et al., "Voice rehabilitation after total laryngectomy," Menoufia Medical Journal, vol. 28, no. 4, pp. 800-806, 2015.
  4. M. I. Singer and E. D. Blom, "Tracheoesophageal punc- ture: A surgical prosthetic method for postlaryngectomy speech restoration," in Third International Symposium on Plastic- Reconstructive Surgery of the Head and Neck, New Orleans, vol. 4, 1979.
  5. E. Houwen, "Development of a handsfree speech valve for laryngectomy patients," Ph.D. dissertation, 2012, relation: https://www.rug.nl/ Rights: University of Groningen.
  6. M. I. Singer, "Tracheoesophageal speech: vocal rehabilitation af- ter total laryngectomy," Laryngoscope, vol. 11, no. 1, pp. 1454- 1465, 1993.
  7. H. F. Nijdam, A. A. Annyas, H. K. Schutte, and H. Leever, "A new prosthesis for voice rehabilitation after laryngectomy," Archives of oto-rhino-laryngology, vol. 237, no. 1, pp. 27-33, Dec 1982. [Online]. Available: https://doi.org/10.1007/BF00453713
  8. Y. Qi, B. Weinberg, and N. Bi, "Enhancement of female esophageal and tracheoesophageal speech," The Journal of the Acoustical Society of America, vol. 98, no. 5, pp. 2461-2465, 1995.
  9. J. Robbins, H. B. Fisher, E. C. Blom, and M. I. Singer, "A Comparative Acoustic Study of Normal, Esophageal, and Tracheoesophageal Speech Production," Journal of Speech and Hearing Disorders, vol. 49, no. 2, pp. 202-210, 1984. [Online]. Available: http://dx.doi.org/10.1044/jshd.4902.202
  10. R. Kazi, E. Kiverniti, V. Prasad, R. Venkitaraman, C. Nutting, P. Clarke, P. RhysEvans, and K. Har- rington, "Multidimensional assessment of female tra- cheoesophageal prosthetic speech," Clinical Otolaryngology, vol. 31, no. 6, pp. 511-517, 2006. [Online]. Avail- able: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1365- 2273.2006.01290.x
  11. A. Singh, R. Kazi, J. D. Cordova, C. Nutting, P. Clarke, K. Harrington, and P. RhysEvans, "Multidimensional assessment of voice after vertical partial laryngectomy: A compari- son with normal and total laryngectomy voice," Journal of Voice, vol. 22, no. 6, pp. 740 -745, 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ S0892199707000616
  12. M. H. Bellandese, J. W. Lerman, and H. R. Gilbert, "An acoustic analysis of excellent female esophageal, tracheoesophageal, and laryngeal speakers," Journal of Speech, Language, and Hearing Research, vol. 44, no. 6, pp. 1315-1320, 2001. [Online].
  13. Available: + http://dx.doi.org/10.1044/1092-4388(2001/102)
  14. H. R. Sharifzadeh, I. V. McLoughlin, and F. Ahmadi, "Recon- struction of normal sounding speech for laryngectomy patients through a modified CELP codec," IEEE Transactions on Biomed- ical Engineering, vol. 57, no. 10, pp. 2448-2458, 2010.
  15. J. j. Li, I. V. McLoughlin, L. R. Dai, and Z. h. Ling, "Whisper-to- speech conversion using restricted Boltzmann machine arrays," Electronics Letters, vol. 50, no. 24, pp. 1781-1782, 2014.
  16. I. V. Mcloughlin, H. R. Sharifzadeh, S. L. Tan, J. Li, and Y. Song, "Reconstruction of phonated speech from whispers using formant-derived plausible pitch modulation," ACM Trans. Access. Comput., vol. 6, no. 4, pp. 12:1-12:21, May 2015. [Online]. Available: http://doi.acm.org/10.1145/2737724
  17. G. Fant, Acoustic theory of speech production: with calculations based on X-ray studies of Russian articulations, 1971, vol. 2.
  18. L. R. Rabiner, R. W. Schafer et al., "Introduction to digital speech processing," Foundations and Trends® in Signal Process- ing, vol. 1, no. 1-2, pp. 1-194, 2007.
  19. J. V. Uspensky, "Introduction to mathematical probability," 1937.
  20. K. P. Murphy, "Conjugate Bayesian analysis of the Gaussian dis- tribution," Tech. Rep., 2007.
  21. S. Geman and D. Geman, "Stochastic relaxation, Gibbs distribu- tions, and the Bayesian restoration of images," IEEE Transactions on pattern analysis and machine intelligence, no. 6, pp. 721-741, 1984.
  22. M. Morise, "D4C, a band-aperiodicity estimator for high-quality speech synthesis," Speech Communication, vol. 84, pp. 57-65, 2016.
  23. I. M. Devices. AUM voice prosthesis. [Online]. Available: http://www.innaumation.com/
  24. A. Wrench, "MOCHA-TIMIT," Department of Speech and Lan- guage Sciences, Queen Margaret University College, Edinburgh, speech database, 1999. [Online]. Available: http://sls.qmuc.ac.uk