MODEL-BASED SYNTHESIS AND TRANSFORMATION OF VOICED SOUNDS

carlo drioli

Outline

MODEL-BASED SYNTHESIS AND TRANSFORMATION OF VOICED SOUNDS

Abstract

In this work a glottal model loosely based on the Ishizaka and Flanagan model is proposed, where the number of parameters is drastically reduced. First, the glottal excitation waveform is estimated, together with the vocal tract filter parameters, using inverse filtering techniques. Then the estimated waveform is used in order to identify the nonlinear glottal model, represented by a closedloop configuration of two blocks: a second order resonant filter, tuned with respect to the signal pitch, and a regressor-based functional, whose coefficients are estimated via nonlinear identification techniques. The results show that an accurate identification of real data can be achieved with less than ½¼ regressors of the nonlinear functional, and that an intuitive control of fundamental features, such as pitch and intensity, is allowed by acting on the physically informed parameters of the model.

References (12)

REFERENCES
D.G. Childers and C.K. Lee, "Vocal quality factors: analysis, synthesis, and perception," J. Acoust. Soc. Am., vol. 90, no. 5, pp. 2394-2410, November 1991.
M. D. Plumpe, T. F. Quatieri, and D. A. Reynolds, "Model- ing of the glottal flow derivative waveform with application to speaker recognition," IEEE Trans. Speech and Audio Pro- cess., vol. 7, no. 5, pp. 569-586, September 1999.
D. G. Childers and C. Ahn, "Modeling the Glottal Volume- Velocity Waveform for Three Voice Types," J. Acoust. Soc. Am., vol. 97, no. 1, pp. 505-519, Jan. 1995.
K. Ishizaka and J. L. Flanagan, "Synthesis of Voiced Sounds from a Two-Mass Model of the Vocal Cords," Bell Syst. Tech. J., vol. 51, pp. 1233-1268, 1972.
D.Y. Wong, J.D. Markel, and A. H. Gray, "Least squares glottal inverse filtering from the acoustic speech waveform," IEEE Trans. Acoustics, Speech and Signal Process., vol. ASSP-27, no. 4, pp. 350-355, August 1979.
R. Smits and B. Yegnanarayana, "Determination of instants of significant excitation in speech using group delay func- tion," IEEE Trans. Speech and Audio Process., vol. 3, no. 5, pp. 325-333, September 1995.
M. Kob, N. Alhäuser, U. Reiter, "Time-domain model of the singing voice," Proc. of DAFx99 Workshop, pp.143-146, Norway, Dec. 1999.
S. Chen, C. F. N Cowan, and P. M. Grant, "Orthogonal least squares learning algorithm for radial basis functions networks," IEEE Trans. Neural Networks, vol. 2, no. 2, pp. 302-309, March 1991.
S. Chen and S. A. Billings, "Representation of non-linear systems: Narmax model," Int. J. of Control, vol. 49, no. 3, pp. 1013-1032, 1989.
X. Rodet, "One and two mass models oscillations for voice and instruments," Proc. Int. Computer Music Conf., Canada, Sept. 1995.
M. E. McIntyre, R. T. Schumacher, and J. Woodhouse, "On the oscillation of musical instruments," J. Acoust. Soc. Am., vol. 74, no. 5, pp. 1325-1345, 1983.

MODEL-BASED SYNTHESIS AND TRANSFORMATION OF VOICED SOUNDS

Sign up for access to the world's latest research

Abstract

Related papers

References (12)

Related papers

Related topics