Thirukkural - A Text-to-Speech Synthesis System
https://doi.org/10.13140/RG.2.1.1185.6484…
6 pages
1 file
Sign up for access to the world's latest research
Abstract
AI
AI
A novel Text-to-Speech (TTS) synthesis system for the Tamil language is proposed, comprising an offline phase focused on pre-processing, pitch marking, and database construction, alongside an online phase for text analysis and synthesis. Various speech synthesis methods are discussed, highlighting the use of concatenation without waveform modification which strikes a balance between database size and quality. The system demonstrates good speech intelligibility with a focus on future enhancements for emotional speech, online capabilities, and better handling of non-native words.
Related papers
International Journal of Advanced Computer Research, 2019
Among them, the concatenative synthesis approach is used in our system because it can generate natural sound as a consequence of pre-recorded sound. The speech quality and the size of the system is a tradeoff based on the different speech units for concatenation. The current speech units are word, syllable, phoneme, di-phone, tri-phone and so on. Many TTS systems proposed by [2-6] have been implemented by using concatenative method based on different speech units and they can generate high quality synthesized speech. A numerical TTS synthesis system for three languages: Marathi, Hindi and English languages is proposed by [7]. They used the approach that combined rule-based approach and concatenation-based approach. They used all utterances of sound units have been used for concatenation and generation of speech signal. They compare two Arabic text to speech systems: two screen readers, namely, non-visual desktop access (NVDA) and integrated bilingual solution for the blind or visually impaired, in the Arab (IBSAR) [8]. They tested the quality of two systems in terms of
This research paper addresses the problem of improving the intelligibility of the synthesized speech in Marathi TTS synthesis system. The human speech is artificially generated by Speech synthesis. The normal language text will be automatically converted into speech using Text-to-speech system. This research paper deals with a corpus-driven Marathi TTS system based on the concatenative synthesis approach. Concatenative speech synthesis involves the concatenation of the basic units to synthesize an intelligent, natural sounding speech. In this paper syllables are the basic unit of speech synthesis database and the modification of syllable pitch by time scale modification. The speech units are annotated with associated prosodic information about each unit, manually or automatically, based on an algorithm. An annotated speech corpus utilizes the clustering technique that provides way to select the suitable unit for concatenation, depends on the minimum total join cost of the speech unit. The entered text file is analyzed first, this syllabication is performed based on the linguistics rules and the syllables are stored separately. Then the syllable corresponding speech file is concatenated and the silence present in the concatenated speech is removed. After that discontinuities are minimized at syllable boundaries without degrading the quality. Smoothing at the concatenatedsyllable boundary is performed and changing the syllable pitches by time scale modification.
International Journal of Computer Applications, 2013
This paper confer the tools and methodology used in developing a Nepali Text to Speech Synthesis System, which is based on concatenative approach employing Epoch Synchronous Non Overlap Add Method (ESNOLA), which uses signal dictionary having raw sound signal representing parts of phonemes as a speech database. The developed system is an unintonated (flat) TTS system where the pitch of the pre-recorded speech signal remains same throughout, while taking care of aspects such as naturalness, personality, platform independence and quality assessments. Some of the applications and problems encountered with TTS systems are also discussed.
2002
We report the design and development of Thirukkurul, the first text-to-speech converter in Tamil. Syllables of different lengths have been selected as units since Tamil is a syllabic language. Automatic segmentation algorithm has been devised for segmenting syllables into consonant and vowel. The units are pitch marked using Discrete Cosine Transform -Spectral Auto-correlation Function (DCT-SAF) [6]. Prosodic information is captured in tables based on extensive observation of spoken Tamil. During synthesis, DCT based pitch modification [3][7][11] is applied for both waveform interpolation and modifying pitch contour for different sentence modalities. Thirukkural is designed in VC++ and runs on windows 95/98/NT. Perceptual evaluation by natives show that the synthesized speech is intelligible and fairly natural. 0-7803-7395-2/02/$17.00 02002 IEEE
This paper describes the design and development of TTS. This paper describes the overview of different types of synthesis system. One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. The system used the Syllabication procedure and Phones and Diphones. I. Introduction Speech synthesizer or Text to speech Synthesizer is most widely used system in speech technology. We have various text to speech synthesizer systems available like Festival, Multilingual and Flite etc. A Text-To-Speech (TTS) synthesizer is a computer-based system that should be able to read any text aloud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. As such, the process of TTS conversion allows the transformation of a string of phonetic and prosodic symbols into a synthetic speech signal. The quality of the result produced by a TTS sy...
2010
The goal of this paper is to provide a short but comprehensive overview of text-to-speech synthesis by highlighting its natural language processing (NLP) and digital signal processing (DSP) components. First, the front-end or the NLP component comprised of text analysis, phonetic analysis, and prosodic analysis is introduced then two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained. After that concatenative synthesis is explored. Compared to rulebased synthesis, concatenative synthesis is simpler since there is no need to determine speech production rules. However, concatenative synthesis introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances of each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. Finally, hidden Markov model (HMM) synthesis is introduced.
2004
Abstract—A Malay speech synthesizer system will be discussed. This paper will cover the available Malay speech synthesis system, the underlying structure of our system, brief description of crucial modules, general evaluation of the system, the proposed enhancement and future work of Malay text-to-speech system in Computer-Aided Translation Unit (UTMK). The objective is to highlight how our system works and how to improve its performance. We would also enlighten our future paradigm in the text-to-speech research at UTMK.
Multimedia Systems, 2020
The text to speech technology has achieved significant progress during the past decade and is an active area of research and development in providing different human-computer interactive systems. Even though a number of speech synthesis models are available for different languages focusing on the domain requirements with many motive applications, a source of information on current trends in Indian language speech synthesis is unavailable till date making it difficult for the beginners to initiate research for the development of TTS systems for the low-resourced languages. This paper provides a review of the contributions made by different researchers in the field of Indian language speech synthesis along with a study on the Indian language characteristics and the associated challenges in designing TTS systems. A set of available applications and tools results out of different projects undertaken by different organizations along with a set of possible future developments are also discussed to provide a single reference to an important strand of research in speech synthesis which may benefit anyone interested to initiate research in this area.
The main objective of this paper is to provide a comparison between two di-phone-based concatenative speech synthesis systems for Marathi language. In concatenative speech synthesis systems, speech is generated by joining small prerecorded speech units which are stored in the speech unit register. A di-phone is a speech unit that begins at the middle of one phoneme and extends to the middle of the following one. Di-phones are commonly used in concatenative text to speech (TTS) systems as they have the advantage of modeling co-articulation by including the transition to the next phone inside the unit itself. The first synthesizer in this comparison was implemented using the Festival TTS system and the other synthesizer uses the MARY TTS system. In this comparison, the differences between the two systems in handling some of the challenges of the Marathi language and the differences between the Festival TTS system and the MARY TTS system in the DSP modules are highlighted. Also, the results of applying the diagnostic rhyme test (DRT) on both of the synthesizers are illustrated.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (2)
- Douglas O'Shaughnessy (2000),
- Shure SM58 Technical Manual