Techniques for Crosslingual Voice Conversion
2010, International Symposium on Multimedia
https://doi.org/10.1109/ISM.2010.62Abstract
The cross lingual voice conversion problem refers to the replacement of a speaker's timbre or vocal identity in a recorded sentence, assuming that the source speaker and target speaker use different languages. This problem differs from typical voice conversion in the sense that the mapping of acoustical features cannot depend on time-aligned recordings of source and target speakers uttering the
References (29)
- M. Abe, K. S., and H. K. Cross-language voice conversion. In ICASSP, 1990.
- S. Ahmadi, A. S. Spanias, N. M. P. Inc, and C. A. San Diego. Cepstrum-based pitch detection using a new statistical V/UV classification algorithm. IEEE TSAP, 7(3):333-338, 1999.
- L. M. Arslan. Speaker transformation algorithm using seg- mental codebooks (STASC). Speech Communication, 28, 1999.
- B. Atal and L. Rabiner. A pattern recognition approach to voiced-unvoiced-silence classification with applications to speech recognition. IEEE TASSP, 24(3):201-212, 1976.
- D. G. Childers, B. Y., and Ke W. Voice conversion: Factors responsible for quality. ICASSP, 1985.
- R. Crochiere. A weighted overlap-add method of short-time Fourier analysis/synthesis. IEEE TASSP, 28(1):99-102, 1980.
- S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. Black, and K. Prahallad. Voice conversion using artificial neural networks. In IEEE SLT, 2008.
- S. Desai, B. Yegnanarayana, and K. Prahallad. A Framework for Cross-Lingual Voice Conversion using Artificial Neural Networks. In 7th ICON, 2009.
- H. Duxans, D. Erro, J. Pérez, F. Diego, A. Bonafonte, and A. Moreno. Voice conversion of non-aligned data using unit selection. In TC-STAR WSST, 2006.
- D. Erro and A. Moreno. Weighted frequency warping for voice conversion. In Interspeech, 2007.
- J. L. Flanagan, D. I. S. Meinhart, R. M. Golden, and M. M. Sondhi. Phase vocoder. J. ASA, 38:939, 1965.
- T. Sone T. Nimura H. Matsumoto, S. Hiki. Multidimensional representation of personal quality of vowels and its acoustical correlates. IEEE TAE, 21(5):428-436, 1973.
- J. C. Hardwick and J. S. Lim. Voiced/unvoiced estimation of an acoustic signal, June 1 1993. US Patent 5,216,747.
- A. Kain and M. W. Macon. Spectral voice conversion for text-to-speech synthesis. In IEEE ICASSP, volume 1, 1998.
- A. Kumar and A. Verma. Using phone and diphone based acoustic models for voice conversion: a step towards creating voice fonts. In ICASSP, pages 720-723, 2003.
- A. F. Machado and M. Queiroz. Voice conversion: A critical survey. In SMC, 2010.
- L. Mesbahi, V. Barreaud, and O. Boeffard. Comparing GMM- based speech transformation systems. In Interspeech, pages 1989-1992, 2007.
- M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yeg- nanarayana. Transformation of formants for voice conversion using artificial neural networks. Speech Communication, 16, 1995.
- M. Nishiguchi, J. Matsumoto, and S. Ono. Voiced/unvoiced decision based on frequency band ratio, September 28 1999. US Patent 5,960,388.
- M. S. Puckette. Phase bashing for sample-based formant synthesis. In ICMC, pages 733-736, 2005.
- A. Rinscheid. Voice conversion based on topological feature maps and time-variant filtering. In 4th ICSLP, 1996.
- D. Sundermann, H. Hoge, A. Bonafonte, H. Ney, A. Black, and S. Narayanan. Text-independent voice conversion based on unit selection. In ICASSP, 2006.
- D. Sundermann, H. Ney, and H. Hoge. VTLN-based cross- language voice conversion. In ASRU, 2003.
- K. Tanaka and M. Abe. A new fundamental frequency mod- ification algorithm with transformation of spectrum envelope according to F0. In IEEE ICASSP, volume 2, 1997.
- A. J. Uriz, P. D. Aguero, A. Bonafonte, and J. C. Tulli. Voice Conversion using K-Histograms and Frame Selection. Interpeech, 2009.
- H. Valbret, E. M., and J.P. T. Voice Tranformation Using PSOLA Technique. In 2nd ECSCT, 1991.
- M. Benbouchta W. Yang and R. Yantorno. Performance of the modified bark spectral distortion as an objective speech quality measure. In ICASSP, 1998.
- J. Wouters and M. W. Macon. A perceptual evaluation of distance measures for concatenative speech synthesis. In ICSLP, 1998.
- M. Zhang, J. Tao, J. Nurminen, J. Tian, and X. Wang. Phoneme cluster based state mapping for text-independent voice conversion. In ICASSP, pages 4281-4284, 2009.