Academia.eduAcademia.edu

Outline

Segment selection in the L&h Realspeak laboratory TTS system

2000

Abstract

The L&H RealSpeak Laboratory TTS (RSLab) system is a corpus based speech synthesis system comprising components that deal with linguistic processing, prosody prediction, segment selection, concatenation and modification. In this paper we focus on the segment selection process. During segment selection, the units in a large database of speech are scored with a cost according to their prosodic/phonetic mismatch with the target description of the utterance to be synthesized. This prosodic/phonetic cost is computed on the basis of a combination of symbolic and numeric features. The candidate units from the speech database are then evaluated for the ease with which they can be concatenated. A dynamic programming algorithm, using additive costs, is used to find the optimal path of candidates to represent the spoken utterance. The chosen segments are then concatenated in the time domain to yield a smooth-sounding speech signal, with natural-sounding prosody. One of the keys to the success ...

References (9)

  1. M. Balestri, A. Pacchiotti, S. Quazza, P.L. Salza & S. Sandri, "Choose the best to modify the least: a new generation concatenative synthesis system," Proc. Eurospeech '99, Budapest, Vol. 5, pp. 2291-2294, 1999.
  2. R. Bellman, "The theory of dynamic programming," Bulletin of the American Mathematical Society, 60, 503- 515, 1954.
  3. M. Beutnagel, A. Conkie & A.K. Syrdal, "Diphone synthesis using unit selection," Proc.
  4. ESCA/COCOSDA International Workshop on Speech Synthesis, Jenolan Caves, pp. 185-190, 1998.
  5. A.W. Black & N. Campbell, "Optimizing selection of units from speech databases for concatenative synthesis," Proc. Eurospeech '95, Madrid, pp. 581-584, 1995.
  6. A.J. Hunt & A.W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," Proc. ICASSP '96, Atlanta, Vol. 1, pp. 373- 376, 1996.
  7. N. Iwahashi, N. Kaiki & Y. Sagisaka, "Concatenative speech synthesis by minimum distortion criteria," Proc. ICASSP'92, San Francisco, Vol. 2, pp. 65-68, 1992.
  8. P. Rutten, G. Coorman, J. Fackrell & B. Van Coile, "Corpus based speech synthesis in the Lernout & Hauspie RealSpeak TTS system," Proc. IEE symposium on State- of-the-Art in Speech Synthesis, Savoy Place, London, pp. 16/1-16/7, 2000.
  9. E. Zwicker & H. Fastl, "Psychoacoustics. Facts and Models," Springer Verlag, Heidelberg, New York, 1999.