Spectral Envelope Representation using Sums of Gaussians

Anderson Fraiha Machado

Outline

Spectral Envelope Representation using Sums of Gaussians

Anderson Fraiha Machado

2012

Abstract

The aim of this paper is to present a new approach to spectral representation using sums of Gaussian distributions. Sums of Gaussians provide an intuitive representation for frequency bands of a signal spectrum as well as formant regions. The representation of spectral envelopes using Gaussian parameters {a, μ, σ} simplifies the expression of important tasks such as frequency warping and formant manipulation. Marquardt’s algorithm has been extended to estimate parameters of the Gaussian models for each frequency band, allowing each Gaussian parameter to be either optimized for fitting a given spectral sub-band, or else have a fixed value for reducing the number of model parameters. This allows for several choices on the sets of free/fixed parameters and the sizes of models. Experimental results show that the models proposed offer an accurate approximation of spectral envelope, and provide good perceptual results when applied to pitch shifting.

Figures (4)

Fig. 1. Dynamic Evolution of Gaussian Parameters.

Table 1. Comparison of the proposed algorithms.

Table 2. Avg. MSE and Avg. # of Iterations for each method. 4.2 Subjective Evaluation

Table 3. MOS subjective evaluation. As can be seen in Table 3, these methods achieve better results with signals that contains fewer harmonics (female voices), but MOS values are lower on signals with many harmonics (male voices). These lower values may be due to problems in the phases estimation of each frame in the reconstruction step, and will be better addressed in future works.

References (10)

P. Zolfaghari and T. Robinson, "Formant analysis using mixtures of gaussians," in Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, vol. 2. IEEE, 1996, pp. 1229-1232.
B. Nguyen and M. Akagi, "Spectral modification for voice gender conversion using temporal decomposition," Journal of Signal Processing, 2007.
E. Godoy, O. Rosec, and T. Chonovel, "Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 4, pp. 1313-1323, 2012.
H. Kawahara, "Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited," in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 2. IEEE, 1997, pp. 1303-1306.
P. Zolfaghari, S. Watanabe, A. Nakamura, and S. Katagiri, "Bayesian modelling of the speech spectrum using mixture of gaussians," in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on, vol. 1. IEEE, 2004, pp. I-553.
E. Godoy, O. Rosec, and T. Chonavel, "Speech spectral envelope estimation through explicit control of peak evolution in time," in Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on. IEEE, 2010, pp. 209-212.
A. Goshtasby and W. D. O'Neill, "Curve fitting by a sum of gaussians," CVGIP: Graphical Model and Image Processing, vol. 56, no. 4, pp. 281-288, 1994.
D. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," Journal of the society for Industrial and Applied Mathematics, vol. 11, no. 2, pp. 431-441, 1963.
D. Erro, A. Moreno, and A. Bonafonte, "Flexible harmonic/stochastic speech synthesis," in 6th ISCA Workshop on Speech Synthesis, 2007.
D. Seung and L. Lee, "Algorithms for non-negative matrix factorization," Advances in neural information processing systems, vol. 13, pp. 556-562, 2001.

Spectral Envelope Representation using Sums of Gaussians

Sign up for access to the world's latest research

Abstract

Related papers

References (10)

Related papers