Academia.eduAcademia.edu

Outline

Spectral Envelope Representation using Sums of Gaussians

2012

Abstract

The aim of this paper is to present a new approach to spectral representation using sums of Gaussian distributions. Sums of Gaussians provide an intuitive representation for frequency bands of a signal spectrum as well as formant regions. The representation of spectral envelopes using Gaussian parameters {a, μ, σ} simplifies the expression of important tasks such as frequency warping and formant manipulation. Marquardt’s algorithm has been extended to estimate parameters of the Gaussian models for each frequency band, allowing each Gaussian parameter to be either optimized for fitting a given spectral sub-band, or else have a fixed value for reducing the number of model parameters. This allows for several choices on the sets of free/fixed parameters and the sizes of models. Experimental results show that the models proposed offer an accurate approximation of spectral envelope, and provide good perceptual results when applied to pitch shifting.

References (10)

  1. P. Zolfaghari and T. Robinson, "Formant analysis using mixtures of gaussians," in Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on, vol. 2. IEEE, 1996, pp. 1229-1232.
  2. B. Nguyen and M. Akagi, "Spectral modification for voice gender conversion using temporal decomposition," Journal of Signal Processing, 2007.
  3. E. Godoy, O. Rosec, and T. Chonovel, "Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 4, pp. 1313-1323, 2012.
  4. H. Kawahara, "Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited," in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 2. IEEE, 1997, pp. 1303-1306.
  5. P. Zolfaghari, S. Watanabe, A. Nakamura, and S. Katagiri, "Bayesian modelling of the speech spectrum using mixture of gaussians," in Acoustics, Speech, and Signal Processing, 2004. Proceedings.(ICASSP'04). IEEE International Conference on, vol. 1. IEEE, 2004, pp. I-553.
  6. E. Godoy, O. Rosec, and T. Chonavel, "Speech spectral envelope estimation through explicit control of peak evolution in time," in Information Sciences Signal Processing and their Applications (ISSPA), 2010 10th International Conference on. IEEE, 2010, pp. 209-212.
  7. A. Goshtasby and W. D. O'Neill, "Curve fitting by a sum of gaussians," CVGIP: Graphical Model and Image Processing, vol. 56, no. 4, pp. 281-288, 1994.
  8. D. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," Journal of the society for Industrial and Applied Mathematics, vol. 11, no. 2, pp. 431-441, 1963.
  9. D. Erro, A. Moreno, and A. Bonafonte, "Flexible harmonic/stochastic speech synthesis," in 6th ISCA Workshop on Speech Synthesis, 2007.
  10. D. Seung and L. Lee, "Algorithms for non-negative matrix factorization," Advances in neural information processing systems, vol. 13, pp. 556-562, 2001.