Developing a Child Friendly Text-to-Speech System
2008, Advances in Human-computer Interaction
https://doi.org/10.1155/2008/597971Abstract
This paper discusses the implementation details of a child friendly, good quality, English text-to-speech (TTS) system that is phoneme-based, concatenative, easy to set up and use with little memory. Direct waveform concatenation and linear prediction coding (LPC) are used. Most existing TTS systems are unit-selection based, which use standard speech databases available in neutral adult voices. Here reduced memory is achieved by the concatenation of phonemes and by replacing phonetic wave files with their LPC coefficients. Linguistic analysis was used to reduce the algorithmic complexity instead of signal processing techniques. Sufficient degree of customization and generalization catering to the needs of the child user had been included through the provision for vocabulary and voice selection to suit the requisites of the child. Prosody had also been incorporated. This inexpensive TTS system was implemented in MATLAB, with the synthesis presented by means of a graphical user interface (GUI), thus making it child friendly. This can be used not only as an interesting language learning aid for the normal child but it also serves as a speech aid to the vocally disabled child. The quality of the synthesized speech was evaluated using the mean opinion score (MOS).
References (15)
- T. Parsons, Voice and Speech Processing, McGraw-Hill, New York, NY, USA, 1987.
- E. Keller, Ed., Fundamentals of Speech Synthesis and Speech Recognition, John Wiley & Sons, New York, NY, USA, 1994.
- D. O'Shaughnessy, Speech Communications: Human and Machine, Cambridge University Press, Cambridge, UK, 2001.
- C. Rowden, Speech Processing, McGraw-Hill, New York, NY, USA, 1992.
- T. Dutoit, An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1976.
- D. Jones, Cambridge English Pronouncing Dictionary, Cam- bridge University Press, Cambridge, UK, 2003.
- T. Balasubramanian, A Textbook of English Phonetics for Indian Students, Macmillan India, New Delhi, India, 2003.
- VoiceSynthesis, http://www.hitl.washington.edu/scivw/EVE/.
- M. Ostendorf and I. Bulyko, "The impact of speech recog- nition on speech synthesis," IEEE Communications Magazine, pp. 99-104, 2002.
- M. Tatham and E. Lewis, "Improving text-to-speech synthe- sis," Proceedings of the Institute of Acoustics, vol. 18, no. 9, pp. 35-42, 1996.
- A. S. Black, P. Taylor, and R. Caley, "The Festival Speech Synthesis System," http://www.festvox.org/festival/.
- A. M. Kondoz, Digital Speech Coding for Low Bit Rate Communication, John Willey & Sons, New York, NY, USA, 1994.
- C. Delogu, A. Paoloni, and P. Pocci, "New directions in the evaluation of voice input/output systems," IEEE Journal on Selected Areas in Communications, vol. 9, no. 4, pp. 566-573, 1991.
- Y. Sagisaka, "Speech synthesis from text," IEEE Communica- tions Magazine, vol. 28, no. 1, pp. 35-41, 1990.
- T. Toda, H. Kawai, M. Tsuzaki, and K. Shikano, "Unit selection algorithm for Japanese speech synthesis based on both phoneme unit and diphone unit," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '02), vol. 1, pp. 465-468, Orlando, Fla, USA, May 2002.