Integrated software for analysis and synthesis of voice quality
https://doi.org/10.3758/BRM.42.4.1030Abstract
Voice quality is an important perceptual cue in many disciplines, but knowledge of its nature is limited by a poor understanding of the relevant psychoacoustics. This article (aimed at researchers studying voice, speech, and vocal behavior) describes the UCLA voice synthesizer, software for voice analysis and synthesis designed to test hypotheses about the relationship between acoustic parameters and voice quality perception. The synthesizer provides experimenters with a useful tool for creating and modeling voice signals. In particular, it offers an integrated approach to voice analysis and synthesis and allows easy, precise, spectral-domain manipulations of the harmonic voice source. The synthesizer operates in near real time, using a parsimonious set of acoustic parameters for the voice source and vocal tract that a user can modify to accurately copy the quality of most normal and pathological voices. The software, user's manual, and audio files may be downloaded from http://brm.psychonomic-journals.org/content/supplemental. Future updates may be downloaded from www.surgery.medsch.ucla.edu/glottalaffairs/.
References (50)
- for the glottal source waveform. Proceedings of the IEEE International Conference on Acoustical Speech Signal Processing, 1605-1608.
- Gerratt, B. R., & Kreiman, J. (2001). Measuring vocal quality with speech synthesis. Journal of the Acoustical Society of America, 110, 2560-2566. doi:10.1121/1.1409969
- Gerratt, B. R., Till, J., Rosenbek, J. C., Wertz, R. T., & Boysen, A. E. (1991). Use and perceived value of perceptual and instrumental measures in dysarthria management. In C. A. Moore, K. M. Yorkston, & D. R. Beukelman (Eds.), Dysarthria and apraxia of speech: Per- spectives on management (pp. 77-93). Baltimore: Brookes.
- Goldinger, S. D., Pisoni, D. B., & Logan, J. S. (1991). On the nature of talker variability effects on recall of spoken word lists. Journal of Experimental Psychology: Learning, Memory, & Cognition, 17, 152- 162. doi:10.1037/0278-7393.17.1.152
- Gordon, M., & Ladefoged, P. (2001). Phonation types: A cross- linguistic overview. Journal of Phonetics, 29, 383-406.
- Hanson, H. M. (1997). Glottal characteristics of female speakers: Acoustic correlates. Journal of the Acoustical Society of America, 101, 466-481. doi:10.1121/1.417991
- Hawks, J. W., & Miller, J. D. (1995). A formant bandwidth estimation procedure for vowel synthesis. Journal of the Acoustical Society of America, 97, 1343-1344.
- Henrich, N., D'Alessandro, C., & Doval, B. (2001). Spectral cor- relates of voice open quotient and glottal flow asymmetry: Theory, limits and experimental data. EUROSPEECH 2001, 47-51.
- Huffman, M. K. (1987). Measures of phonation type in Hmong. Jour- nal of the Acoustical Society of America, 81, 495-504.
- Iseli, M., & Alwan, A. (2004). An improved correction formula for the estimation of harmonic magnitudes and its application to open quotient estimation. In Acoustics, Speech, and Signal Processing (ICASSP 2004 Proceedings), pp. 669-672.
- Javkin, H. R., Antoñanzas-Barroso, N., & Maddieson, I. (1987). Digital inverse filtering for linguistic research. Journal of Speech & Hearing Research, 30, 122-129.
- Karlsson, I. (1991). Female voices in speech synthesis. Journal of Pho- netics, 19, 111-120.
- Kawahara, H. (1997). Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited. Pro- ceedings of the IEEE International Conference on Acoustics, Speech, & Signal Processing, 2, 1303-1306.
- Kempster, G. B., Gerratt, B. R., Verdolini Abbott, K., Barkmeier- Kraemer, J., & Hillman, R. (2009). Consensus auditory-perceptual evaluation of voice: Development of a standardized clinical proto- col. American Journal of Speech-Language Pathology, 18, 124-132. doi:10.1044/1058-0360(2008/08-0017)
- Klatt, D. H. (1980). Software for a cascade/parallel formant synthe- sizer. Journal of the Acoustical Society of America, 67, 971-975.
- Klatt, D. H., & Klatt, L. C. (1990). Analysis, synthesis, and percep- tion of voice quality variations among female and male talkers. Jour- nal of the Acoustical Society of America, 87, 820-857. doi:10.1121/ 1.398894
- Kreiman, J., & Gerratt, B. R. (2005). Perception of aperiodicity in pathological voice. Journal of the Acoustical Society of America, 117, 2201-2211.
- Kreiman, J., & Gerratt, B. R. (2010). Perceptual sensitivity to first harmonic amplitude in the voice source. Journal of the Acoustical Society of America, 128, 2085-2089.
- Kreiman, J., Gerratt, B. R., Iseli, M., Neubauer, J., Shue, Y.-L., & Alwan, A. (2008, August). The relationship between open quotient and H * 1 -H * 2 . In Proceedings of the 6th International Conference on Voice Physiology & Biomechanics. Tampere, Finland.
- Kreiman, J., Gerratt, B. R., Kempster, G. B., Erman, A., & Berke, G. S. (1993). Perceptual evaluation of voice quality: Review, tutorial, and a framework for future research. Journal of Speech & Hearing Research, 36, 21-40.
- Kreiman, J., Gerratt, B. R., & Khan, S. D. (2010). Effects of native language on perception of voice quality. Journal of Phonetics, 38, 588-593. doi:10.1016/j.wocn.2010.08.004
- Krishnan, A., & Gandour, J. T. (2009). The role of the auditory brain- stem in processing linguistically relevant pitch patterns. Brain & Lan- guage, 110, 135-148. doi:10.1016/j.bandl.2009.03.005
- Michaelis, D., Gramss, T., & Strube, H. W. (1997). Glottal-to-noise REFERENCES
- Airas, M. (2008). TKK Aparat: An environment for voice inverse fil- tering and parameterization. Logopedics Phoniatrics Vocology, 33, 49-64. doi:10.1080/14015430701855333
- Ananthapadmanabha, T. V. (1984). Acoustic analysis of voice source dynamics. Speech Transmission Laboratory Quarterly Progress & Status Report, 25(2-3), 1-24.
- Andruski, J., & Ratliff, M. (2000). Phonation types in production of phonological tone: The case of Green Mong. Journal of the Interna- tional Phonetic Association, 30, 37-61.
- Baken, R. J. (1987). Clinical measurement of speech and voice. Boston: College Hill.
- Bangayan, P., Long, C., Alwan, A. A., Kreiman, J., & Gerratt, B. R. (1997). Analysis by synthesis of pathological voices using the Klatt synthesizer. Speech Communication, 22, 343-368. doi:10.1016/ S0167-6393(97)00032-0
- Belotel-Grenié, A., & Grenié, M. (2004). The creaky voice phonation and the organisation of Chinese discourse. International Symposium on Tonal Aspects of Languages: With Emphasis on Tone Languages (ISCA, Beijing, China), pp. 5-8.
- Colton, R. H., Casper, J. K., & Leonard, R. (2005). Understanding voice problems: A physiological perspective for diagnosis and treat- ment. Baltimore: Lippincott Williams & Wilkins.
- Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language & Speech, 40, 141-201.
- de Krom, G. (1993). A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. Journal of Speech & Hear- ing Research, 36, 254-266.
- Fant, G. (1960). Acoustic theory of speech production. The Hague: Mouton.
- Fant, G. (1979). Glottal source and excitation analysis. Speech Trans- mission Laboratory Quarterly Progress & Status Report, 20(1), 85- 107.
- Fant, G. (1995). The LF-model revisited. Transformations and fre- quency domain analysis. Speech Transmission Laboratory Quarterly Progress & Status Report, 36(2-3), 119-156.
- Fant, G., Liljencrants, J., & Lin, Q. (1985). A four-parameter model of glottal flow. Speech Transmission Laboratory Quarterly Progress & Status Report, 26(4), 1-13.
- Fischer-Jörgensen, E. (1967). Phonetic analysis of breathy (mur- mured) vowels in Gujarati. Indian Linguistics, 28, 71-139.
- Fitch, W. T., Neubauer, J., & Herzel, H. (2002). Calls out of chaos: The adaptive significance of nonlinear phenomena in mammalian vocal production. Animal Behaviour, 63, 407-418. doi:10.1006/ anbe.2001.1912
- Fujisaki, H., & Ljungqvist, M. (1986). Proposal and evaluation of excitation ratio: A new measure for describing pathological voices. Acustica, 83, 700-706.
- Ní Chasaide, A., & Gobl, C. (1997). Voice source variation. In W. J. Hardcastle & J. Laver (Eds.), The handbook of phonetic sciences (pp. 427-461). Oxford: Blackwell.
- Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60, 355-376.
- Qi, Y., & Hillman, R. E. (1997). Temporal and spectral estimations of harmonics-to-noise ratio in human voice signals. Journal of the Acoustical Society of America, 102, 537-543.
- Rothenberg, M. (1973). A new inverse-filtering technique for deriving the glottal air flow waveform during voicing. Journal of the Acoustical Society of America, 53, 1632-1645. doi:10.1121/1.1913513
- Rothenberg, M. (1977). Measurement of airflow in speech. Journal of Speech & Hearing Research, 20, 155-176.
- Shrivastav, R. (2003). The use of an auditory model in predicting per- ceptual ratings of breathy voice quality. Journal of Voice, 17, 502-512. doi:10.1067/S0892-1997(03)00077-8
- Shue, Y.-L., Kreiman, J., & Alwan, A. (2009). A novel codebook search technique for estimating the open quotient. Interspeech 2009, 2895-2898.
- Stevens, K. N. (1998). Acoustic phonetics. Cambridge, MA: MIT Press.
- Strik, H. (1998). Automatic parameterization of differentiated glottal flow: Comparing methods by means of synthetic flow pulses. Journal of the Acoustical Society of America, 103, 2659-2669.
- Strik, H., & Boves, L. (1992). On the relation between voice source parameters and prosodic features in connected speech. Speech Com- munication, 11, 167-174.
- Volodina, E. V., Volodin, I. A., Isaeva, I. V., & Unck, C. (2006). Biphonation may function to enhance individual recognition in the dhole, Cuon alpinus II. Ethology, 112, 815-825. doi:10.1111/j.1439 -0310.2006.01231.x