Academia.eduAcademia.edu

Outline

Phonated speech reconstruction using twin mapping models

2015, 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

https://doi.org/10.1109/ISSPIT.2015.7394247

Abstract

Computational speech reconstruction algorithms have the ultimate aim of returning natural sounding speech to aphonic and dysphonic individuals. These algorithms can also be used by unimpaired speakers for communicating sensitive or private information. When the glottis loses function due to disease or surgery, aphonic and dysphonic patients retain the power of vocal tract modulation to some degree but they are unable to speak anything more than hoarse whispers without prosthetic aid. While whispering can be seen as a natural and secondary aspect of speech communications for most people, it becomes the primary mechanism of communications for those who have impaired voice production mechanisms, such as laryngectomees. In this paper, by considering the current limitations of speech reconstruction methods, a novel algorithm for converting whispers to normal speech is proposed and the efficiency of the algorithm is discussed. The proposed algorithm relies upon twin mapping models and makes use of artificially generated whispers (called whisperised speech) to regenerate natural phonated speech from whispers. Through a training-based approach, the mapping models exploit whisperised speech to overcome frame to frame time alignment problem in the speech reconstruction process.

References (30)

  1. R. Pietruch, M. Michalska, W. Konopka, and A. Grzanka, "Methods for formant extraction in speech of patients after total laryngectomy," Biomedical Signal Processing and Control, vol. 1, pp. 107-112, 2006.
  2. H. R. Sharifzadeh, I. V. McLoughlin, and M. J. Russell, "A compre- hensive vowel space for whispered speech," Journal of Voice, vol. 26, no. 2, pp. e49 -e56, 2012.
  3. I. McLoughlin, Applied Speech and Audio Processing. Cambridge: Cambridge University Press, 2009.
  4. H. R. Sharifzadeh, "Reconstruction of natural souding speech from whis- pers," Ph.D. dissertation, Nanyang Technological University, Singapore, 2012.
  5. N. P. Solomon, G. N. McCall, M. W. Trosset, and W. C. Gray, "La- ryngeal configuration and constriction during two types of whispering," Journal of Speech and Hearing Research, vol. 32, pp. 161-174, 1989.
  6. V. C. Tartter, "Identifiability of vowels and speakers from whispered syllables," Perception and Psychophysics, vol. 49, pp. 365-372, 1991.
  7. S. R. Schwartz, S. M. Cohen, S. H. Dailey, R. M. Rosenfeld, and E. S. Deutsch, "Clinical practice guideline: hoarseness (dysphonia)," Otolaryngology Head and Neck Surgery, vol. 141, pp. S1-S31, 2009.
  8. L. B. Thomas and J. C. Stemple, "Voice therapy: does science support the art?" Communicative Disorders Review, vol. 1, pp. 49-77, 2007.
  9. L. O. Ramig and K. Verdolini, "Treatment efficacy: voice disorders," Journal of Speech Language and Hearing Research, vol. 41, pp. S101- 16, 1998.
  10. M. Azzarello, B. A. Breteque, R. Garrel, and A. Giovanni, "Deter- mination of oesophageal speech intelligibility using an articulation assessment," Revue de laryngologie, otologie, rhinologie, vol. 126, pp. 327-334, 2005.
  11. V. Callanan, P. Gurr, D. Baldwin, M. White-Thompson, J. Beckinsale, and J. Bennet, "Provox valve use for post-laryngectomy voice rehabili- tation," Journal of Laryngology and Otology, vol. 109, pp. 1068-1071, 1995.
  12. J. H. Brandenburg, "Vocal rehabilitation after laryngectomy," Archives of Otolaryngology, vol. 106, pp. 688-691, 1980.
  13. G. Culton and J. Gerwin, "Current trends in laryngectomy rehabilitation: A survey of speech language pathologists," Otolaryngology -Head and Neck Surgery, vol. 115, pp. 458-463, 1998.
  14. H. Liu, Q. Zhao, M. Wan, and S. Wang, "Enhancement of electrolarynx speech based on auditory masking," IEEE Transactions on Biomedical Engineering, vol. 53, pp. 865-874, 2006.
  15. E. A. Goldstein, J. T. Heaton, J. B. Kobler, G. B. Stanley, and R. E. Hillman, "Design and implementation of a hands-free electrolarynx device controlled by neck strap muscle electromyographic activity," IEEE Transactions on Biomedical Engineering, vol. 51, pp. 325-332, 2004.
  16. G. A. Gates, W. Ryan, J. C. Cooper, G. F. Lawlis, E. Cantu, T. Hayashi, E. Lauder, R. W. Welch, and E. Hearne, "Current status of laryngectomee rehabilitation: I. results of therapy," American Journal of Otolaryngol- ogy, vol. 3, pp. 1-7, 1982.
  17. R. Hillman, M. Walsh, G. Wolf, and S. Fisher, "Functional outcomes following treatment for advanced laryngeal cancer. part 1. voice preser- vation in advanced laryngeal cancer. part ii. laryngectomy rehabilitation: the state-of-the-art in the va system," Annals of Otology, Rhinology and Laryngology, vol. 107, pp. 1-27, 1998.
  18. H. R. Sharifzadeh, I. V. McLoughlin, and F. Ahmadi, Lecture Notes in Electrical Engineering. Springer, 2010, ch. Speech rehabilitation methods for laryngectomised patients, pp. 597 -607.
  19. R. W. Morris and M. A. Clements, "Reconstruction of speech from whispers," Medical Engineering and & Physics, vol. 24, pp. 515 -520, 2002.
  20. H. R. Sharifzadeh, I. V. McLoughlin, and F. Ahmadi, "Reconstruction of normal sounding speech for laryngectomy patients through a modified celp codec," IEEE Transactions on Biomedical Engineering, vol. 57, pp. 2448-2458, 2010.
  21. T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2505 -2517, 2012.
  22. I. V. McLoughlin, H. R. Sharifzadeh, S. Tan, J. Li, and Y. Song, "Reconstruction of phonated speech from whispers using formant- derived plausible pitch modulation," ACM Transactions on Accessible Computing, vol. 6, no. 4, pp. 12:1-12:21, 2015.
  23. J. Li, I. V. McLoughlin, L. Dai, and Z. Ling, "Whisper-to-speech con- version using restricted boltzmann machine arrays," Electronics Letters, vol. 50, no. 24, pp. 1781 -1782, 2014.
  24. T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, pp. 2222-2235, 2007.
  25. V. C. Tartter, "Whats in a whisper?" Journal of the Acoustical Society of America, vol. 86, pp. 1678-1683, 1989.
  26. G. Fant, Acoustic Theory of Speech Production, 2nd ed. The Hague: Mouton, 1960.
  27. I. B. Thomas, "Perceived pitch of whispered vowels," Journal of the Acoustical Society of America, vol. 46, pp. 468-470, 1969.
  28. K. N. Stevens, Acoustic Phonetics. Cambridge, MA: The MIT Press, 1998.
  29. H. Kawahara, I. Masuda-Katsuse, and A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based f0 extraction," Speech Communi- cation, vol. 27, no. 3, pp. 187 -207, 1999.
  30. B. P. Lim, "Computational differences between whispered and non- whispered speech," Ph.D. dissertation, University of Illinois, 2010.