Academia.eduAcademia.edu

Outline

Text to speech synthesis system for mobile applications,

Proc. Workshop in Image and Signal Processing (WISP-2007)

https://doi.org/10.13140/RG.2.1.4560.7528

Abstract

This paper discusses a Text-To-Speech (TTS) synthesis system embedded in a mobile. The TTS system used is unit selection based concatenative speech synthesizer, where a speech unit is selected from the database based on its phonetic and prosodic context. Speech unit considered in the synthesis is larger than a phone, diphone and syllable. Usually the unit is a word or a phrase. While the quality of the synthesized speech has improved significantly by using corpus-based TTS technology, there is a practical problem regarding the trade-off between database size and quality of synthetic speech, especially in mobile environment. Several speech compression schemes currently used in mobiles today are applied on the database. Speech is synthesized from the input text, using compressed speech in the database, The intelligibility and naturalness of the synthesized speech are studied. Mobiles contain a speech codec, one of the modules in the baseband processing. The idea of this paper is to propose a methodology to use the already available speech codec in the mobile and read a SMS aloud to the listener, when TTS is embedded in a mobile. Experimental results show the clear possibility of our idea.

References (15)

  1. REFERENCES
  2. Nobuo Nukaga, Ryota Kamoshida, Kenji Nagamatsu and Yoshinori Kitahara. "Scalable Implementation of unit selection based text-to-speech system for embedded solutions", Hitachi Ltd. Central Research Laboratory, Japan.
  3. A. G. Ramakrishnan, Lakshmish N Kaushik, Laxmi Narayana. M, "Natural Language Processing for Tamil TTS", Proc. 3rd Language and Technology Conference, Poznan, Poland, October 5-7, 2007.
  4. A Black and N Campbell, "Optimizing selection of units from speech databases for concatenative synthesis", In Proc, Eurospeech, pp. 581-584, 1995.
  5. A Hunt and A Black, "Unit selection in a concatenative speech synthesis system using a large speech database", In Proc. ICASSP, pp. 373-376, 1996.
  6. Digital cellular telecommunications system (Phase 2+) (GSM);
  7. Enhanced Full Rate (EFR) speech transcoding (GSM 06.60 version 8.0.1 Release 1999).
  8. Digital cellular telecommunications system (Phase 2+) (GSM);
  9. Adaptive Multi-Rate (AMR); Speech processing functions; General description (GSM 06.71 version 7.0.2 Release 1998).
  10. Digital cellular telecommunications system (Phase 2);
  11. Full rate speech; Part2: transcoding (GSM 06.10 version 4.3.0 GSM Phase 2).
  12. S Isard and A D Coonkie. Progress in Speech Synthesis, chapter Optimum coupling of diphones. Wiley 2002.
  13. Chang-Heon Lee, Sung-Kyo Jung and Hong-Goo Kang "Applying a Speaker-Dependent Speech Compression Technique to Concatenative TTS synthesizers" IEEE Trans Audio, Speech Lang. Proc., VoL. 15, No. 2, Feb 2007.
  14. PRAAT : A tool for phonetic analyses and sound manipulations by Boersma and Weenink, 1992-2001. www.praat.org
  15. "Flite: a small, fast speech synthesis engine" Edition 1.3, for Flite version 1.3 by Alan W Black and Kevin A.Lenzo. Speech Group at Cranegie Mellon University