Academia.eduAcademia.edu

Outline

A Review of Concatenative text To Speech Synthesis

2014, International Standards Publication -ISP

Abstract

Speech is used to convey information, emotions, and feelings. Speech synthesis is the technique of converting given input text to synthetic speech. Speechsynthesis can be used to read text as in SMS, newspapers, site information etc. and can be used by blind people. Speech synthesis has been widely researched in last four decades. The quality andintelligibility of the synthetic speech produced is remarkably good for most of the applications. This report intends to review four majorly researched methods of speech synthesis viz. Articulatory, Concatenated, Formant, and Quasi-articulatory Synthesis. Mainly in this paper focus is given on Concatenative synthesismethod and some issues of this method are discussed. Articulatory Synthesis is based on human speech production model. The synthetic speech produced by this model is most natural, but it is also the most difficult method. Concatenative Synthesis uses prerecorded speech words, phrases and concatenates them to produce sound. It is the simplest method and yields high-quality speech but is limited by its memory requirement to store beforehand allpossible words, phrases to be produced. Formant Synthesis is based on the acoustic model of the human speech production system. It models the sound source and the resonance in the vocal tract, and is most common model used. Quasi-articulatory Synthesis is a hybrid of articulator acoustic model of speech production. Synthetic speech produced by this model sounds more natural and can be easily customized to meet different requirements of different applications and individualusers.

References (14)

  1. L. R.Rabiner and R. W.Schafer ,Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, NJ, 1978.
  2. D. H. Klatt and L. C. Klatt, "Analysis, synthesis, and perception of voice quality variations among female and male talkers," The Journal of the Acoustical Society of America, vol. 87, pp. 820-857, 1990.
  3. D. H. Klatt, "Review of text-tospeech conversion for English," Journal of the Acoustical Society of America, vol. 82, pp. 737-793, 1987.
  4. Klatt, D. H.: Software for a Cascade/Parallel Formant Synthesiser, The Journal of the Acoustical Society of America, 67(3), Mar. 1980, 971- 995, 1980.
  5. Coker, C. H., "A model of articulatory dynamics and control," Proc. IEEE, 64(4),1976, 452-460.
  6. Mermelstein, P., "Articulatory model for the study of speech production," J. Acoust. Soc. Am.,53(4), 1973, 1070-1082.
  7. Sondhi, M. M. and Schroeter, J., "A hybrid time-frequency domain articulatory speech synthesizer, " IEEE Trans. Acoust., Speech, and Signal Processing, 35(7), 1987, 955-967.
  8. A. Black and K. Lenzo, "Limited domain synthesis," in ICSLP2000, Beijing, China., 2000, vol. II, pp. 411-414.
  9. Kain, A. and Macon M.," Spectral voice conversion for text-to-speech synthesis", In: Proc.ICASSP, Seattle,1998.
  10. Atal B. S and Hanauer Suzanne L., "Speech analysis and synthesis by linear prediction of the speech wave", The journal of acoustic society of America, 1971, pp 637-655.
  11. Rahul Sawant, H.G Virani, and Chetan Desai, "Database selection for Concatenative speech synthesis With novel endpoint detection Algorithm", IJAIEM, Volume 2, Issue 5, May 2013, pp.173-180.
  12. JernejaZganecGros and Mario Zganec, "An Efficient Unit-selection Method for Concatenative Text-to-speech Synthesis Systems", Journal of Computing and Information Technology, 2008, pp. 69-78.
  13. Hiroyuki Segi, Tohru Takagi and Takayuki Ito, " A concatenative speech synthesis method Using context dependent phoneme sequences With variable length as search units",5th ISCA Speech Synthesis Workshop Pittsburgh, PA, USA June 14-16, 2004, pp.116-120.
  14. MunkhtuyaDavaatsagaan, and Kuldip K. Paliwal, "Diphone-Based Concatenative Speech Synthesis System for Mongolian", IMECS, March, 2008, Hong Kong, pp. 19-21.