Modeling the voice source in terms of spectral slopesa)
2016, The Journal of the Acoustical Society of America
https://doi.org/10.1121/1.4944474Abstract
A psychoacoustic model of the voice source spectrum is proposed. The model is characterized by four spectral slope parameters: the difference in amplitude between the first two harmonics (H1-H2), the second and fourth harmonics (H2-H4), the fourth harmonic and the harmonic nearest 2 kHz in frequency (H4-2 kHz), and the harmonic nearest 2 kHz and that nearest 5 kHz (2 kHz-5 kHz). As a step toward model validation, experiments were conducted to establish the acoustic and perceptual independence of these parameters. In experiment 1, the model was fit to a large number of voice sources. Results showed that parameters are predictable from one another, but that these relationships are due to overall spectral roll-off. Two additional experiments addressed the perceptual independence of the source parameters. Listener sensitivity to H1-H2, H2-H4, and H4-2 kHz did not change as a function of the slope of an adjacent component, suggesting that sensitivity to these components is robust. Listener sensitivity to changes in spectral slope from 2 kHz to 5 kHz depended on complex interactions between spectral slope, spectral noise levels, and H4-2 kHz. It is concluded that the four parameters represent non-redundant acoustic and perceptual aspects of voice quality.
References (34)
- ANSI (1960). S1.1-1960, Acoustical Terminology (American National Standards Institute, New York).
- Bickley, C. (1982). "Acoustic analysis and perception of breathy vowels," MIT/RLE Work. Pap. Speech Commun. 1, 71-82.
- Bishop, J., and Keating, P. (2012). "Perception of pitch location within a speaker's range: Fundamental frequency, voice quality, and speaker sex," J. Acoust. Soc. Am. 132, 1100-1112.
- Carr, P. B., and Trill, D. (1964). "Long term larynx-excitation spectra," J. Acoust. Soc. Am. 36, 2033-2040.
- de Krom, G. (1993). "A cepstrum-based technique for determining a har- monics-to-noise ratio speech signals," J. Speech Hear. Res. 36, 254-266.
- Esposito, C. M. (2010). "The effects of linguistic experience on the percep- tion of phonation," J. Phonetics 38, 306-316.
- Fant, G. (1995). "The LF-model revisited. Transformations and frequency domain analysis," Speech Trans. Lab. -Quart. Prog. Status Rep. 36(2-3), 119-156.
- Fastl, H., and Zwicker, E. (2007). Psychoacoustics: Facts and Models (Springer Science and Business Media, Berlin), pp. 1-462.
- Flanagan, J. (1955). "A difference limen for vowel formant frequency," J. Acoust. Soc. Am. 27, 613-617.
- Flanagan, J. (1957a). "Difference limen for formant amplitude," J. Speech Hear. Disord. 22, 205-212.
- Flanagan, J. (1957b). "Note on the design of terminal-analog speech syn- thesizers," J. Acoust. Soc. Am. 29, 306-310.
- Garellek, M., Esposito, C. M., Keating, P., and Kreiman, J. (2013a). "Voice quality and tone identification in White Hmong," J. Acoust. Soc. Am. 133, 1078-1089.
- Garellek, M., Samlan, R. A., Kreiman, J., and Gerratt, B. R. (2013b). "Perceptual sensitivity to a model of the source spectrum," Proc. Meet. Acoust. 19, 060157.
- Gordon, M., and Ladefoged, P. (2001). "Phonation types: A cross-linguistic overview," J. Phonetics 29, 383-406.
- Hanson, H. M. (1997). "Glottal characteristics of female speakers: Acoustic correlates," J. Acoust. Soc. Am. 101, 466-481.
- Javkin, H., Antoñanzas-Barroso, N., and Maddieson, I. (1987). "Digital inverse filtering for linguistic research," J. Speech Hear. Res. 30, 122-129.
- Klatt, D. H., and Klatt, L. C. (1990). "Analysis, synthesis, and perception of voice quality variations among female and male talkers," J. Acoust. Soc. Am. 87, 820-857.
- Kreiman, J., Antoñanzas-Barroso, N., and Gerratt, B. R. (2010). "Integrated software for analysis and synthesis of voice quality," Behav. Res. Methods 42, 1030-1041.
- Kreiman, J., Garellek, M., and Esposito, C. (2011). "Perceptual importance of the voice source spectrum from H2 to 2 kHz," J. Acoust. Soc. Am. 130, 2570.
- Kreiman, J., and Gerratt, B. R. (1998). "Validity of rating scale measures of voice quality," J. Acoust. Soc. Am. 104, 1598-1608.
- Kreiman, J., and Gerratt, B. R. (2005). "Perception of aperiodicity in patho- logical voice," J. Acoust. Soc. Am. 117, 2201-2211.
- Kreiman, J., and Gerratt, B. R. (2011). "Modeling overall voice quality with a small set of acoustic parameters," J. Acoust. Soc. Am. 129, 2529.
- Kreiman, J., and Gerratt, B. R. (2012). "Perceptual interactions of the har- monic source and noise in voice," J. Acoust. Soc. Am. 131, 492-500.
- Kreiman, J., Gerratt, B. R., and Antoñanzas-Barroso, N. (2007). "Measures of the glottal source spectrum," J. Speech Lang. Hear. Res. 50, 595-610.
- Kreiman, J., Gerratt, B. R., Garellek, M., Samlan, R., and Zhang, Z. (2014). "Toward a unified theory of voice production and perception," Loquens 1, e009.
- Kreiman, J., and Sidtis, D. (2011). Foundations of Voice Studies (Wiley-Blackwell, Malden, MA), pp. 1-24.
- Levitt, H. (1971). "Transformed up-down methods in psychoacoustics," J. Acoust. Soc. Am. 49, 467-477.
- Moore, B. C. J. (1973). "Frequency difference limens for short-duration tones," J. Acoust. Soc. Am. 54, 610-619.
- N ı Chasaide, A., and Gobl. C. (1997). "Voice source variation," in The Handbook of Phonetic Sciences, edited by W. J. Hardcastle and J. Laver (Blackwell, Oxford), pp. 427-461.
- Shrivastav, R., and Camacho, A. (2010). "A computational model to predict changes in breathiness resulting from variations in aspiration noise level," J. Voice 24, 395-405.
- Shrivastav, R., and Sapienza, C. M. (2003). "Objective measures of breathy voice quality obtained using an auditory model," J. Acoust. Soc. Am. 114, 2217-2224.
- Shrivastav, R., and Sapienza, C. M. (2006). "Some difference limens for the perception of breathiness," J. Acoust. Soc. Am. 120, 416-423.
- Stevens, S. S. (1936). "A scale for the measurement of a psychological mag- nitude: Loudness," Psychol. Rev. 43, 405-416.
- Sundberg, J., and Gauffin, J. (1979). "Waveform and spectrum of the glottal voice source," in Frontiers of Speech Communication Research, edited by B. Lindblom and S. Ohman (Academic, London), pp. 301-322.