Thirukkural - A Text-to-Speech Synthesis System

Ramakrishnan Angarai Ganesan

doi:10.13140/RG.2.1.1185.6484

Outline

Title

Abstract

Introduction

Building the Database

Natural Language Processing

Thirukkural - A Text-to-Speech Synthesis System

Ramakrishnan Angarai Ganesan

https://doi.org/10.13140/RG.2.1.1185.6484

visibility

…

description

6 pages

link

1 file

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract
AI

A novel Text-to-Speech (TTS) synthesis system for the Tamil language is proposed, comprising an offline phase focused on pre-processing, pitch marking, and database construction, alongside an online phase for text analysis and synthesis. Various speech synthesis methods are discussed, highlighting the use of concatenation without waveform modification which strikes a balance between database size and quality. The system demonstrates good speech intelligibility with a focus on future enhancements for emotional speech, online capabilities, and better handling of non-native words.

Aye Thida

International Journal of Advanced Computer Research, 2019

Among them, the concatenative synthesis approach is used in our system because it can generate natural sound as a consequence of pre-recorded sound. The speech quality and the size of the system is a tradeoff based on the different speech units for concatenation. The current speech units are word, syllable, phoneme, di-phone, tri-phone and so on. Many TTS systems proposed by [2-6] have been implemented by using concatenative method based on different speech units and they can generate high quality synthesized speech. A numerical TTS synthesis system for three languages: Marathi, Hindi and English languages is proposed by [7]. They used the approach that combined rule-based approach and concatenation-based approach. They used all utterances of sound units have been used for concatenation and generation of speech signal. They compare two Arabic text to speech systems: two screen readers, namely, non-visual desktop access (NVDA) and integrated bilingual solution for the blind or visually impaired, in the Arab (IBSAR) [8]. They tested the quality of two systems in terms of

downloadDownload free PDF View PDFchevron_right

Artificially Generatedof Concatenative Syllable based Text to Speech Synthesis System for Marathi

IOSR Journals

This research paper addresses the problem of improving the intelligibility of the synthesized speech in Marathi TTS synthesis system. The human speech is artificially generated by Speech synthesis. The normal language text will be automatically converted into speech using Text-to-speech system. This research paper deals with a corpus-driven Marathi TTS system based on the concatenative synthesis approach. Concatenative speech synthesis involves the concatenation of the basic units to synthesize an intelligent, natural sounding speech. In this paper syllables are the basic unit of speech synthesis database and the modification of syllable pitch by time scale modification. The speech units are annotated with associated prosodic information about each unit, manually or automatically, based on an algorithm. An annotated speech corpus utilizes the clustering technique that provides way to select the suitable unit for concatenation, depends on the minimum total join cost of the speech unit. The entered text file is analyzed first, this syllabication is performed based on the linguistics rules and the syllables are stored separately. Then the syllable corresponding speech file is concatenated and the silence present in the concatenated speech is removed. After that discontinuities are minimized at syllable boundaries without degrading the quality. Smoothing at the concatenatedsyllable boundary is performed and changing the syllable pitches by time scale modification.

downloadDownload free PDF View PDFchevron_right

Nepali Text to Speech Synthesis System using ESNOLA Method of Concatenation

Bhusan Chettri

International Journal of Computer Applications, 2013

This paper confer the tools and methodology used in developing a Nepali Text to Speech Synthesis System, which is based on concatenative approach employing Epoch Synchronous Non Overlap Add Method (ESNOLA), which uses signal dictionary having raw sound signal representing parts of phonemes as a speech database. The developed system is an unintonated (flat) TTS system where the pitch of the pre-recorded speech signal remains same throughout, while taking care of aspects such as naturalness, personality, platform independence and quality assessments. Some of the applications and problems encountered with TTS systems are also discussed.

downloadDownload free PDF View PDFchevron_right

A complete text-to-speech synthesis system in Tamil

Muralishankar Rangarao

2002

We report the design and development of Thirukkurul, the first text-to-speech converter in Tamil. Syllables of different lengths have been selected as units since Tamil is a syllabic language. Automatic segmentation algorithm has been devised for segmenting syllables into consonant and vowel. The units are pitch marked using Discrete Cosine Transform -Spectral Auto-correlation Function (DCT-SAF) [6]. Prosodic information is captured in tables based on extensive observation of spoken Tamil. During synthesis, DCT based pitch modification [3][7][11] is applied for both waveform interpolation and modifying pitch contour for different sentence modalities. Thirukkural is designed in VC++ and runs on windows 95/98/NT. Perceptual evaluation by natives show that the synthesized speech is intelligible and fairly natural. 0-7803-7395-2/02/$17.00 02002 IEEE

downloadDownload free PDF View PDFchevron_right

Design and Development of a Text-To-Speech Synthesizer System

VINEET CHAUHAN CHAUHAN

This paper describes the design and development of TTS. This paper describes the overview of different types of synthesis system. One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. The system used the Syllabication procedure and Phones and Diphones. I. Introduction Speech synthesizer or Text to speech Synthesizer is most widely used system in speech technology. We have various text to speech synthesizer systems available like Festival, Multilingual and Flite etc. A Text-To-Speech (TTS) synthesizer is a computer-based system that should be able to read any text aloud, whether it was directly introduced in the computer by an operator or scanned and submitted to an Optical Character Recognition (OCR) system. As such, the process of TTS conversion allows the transformation of a string of phonetic and prosodic symbols into a synthetic speech signal. The quality of the result produced by a TTS sy...

downloadDownload free PDF View PDFchevron_right

A Simplified Overview of Text-To-Speech Synthesis

Francis Idachaba

downloadDownload free PDF View PDFchevron_right

An overview of text-to-speech synthesis techniques

Hazem El-bakry

2010

The goal of this paper is to provide a short but comprehensive overview of text-to-speech synthesis by highlighting its natural language processing (NLP) and digital signal processing (DSP) components. First, the front-end or the NLP component comprised of text analysis, phonetic analysis, and prosodic analysis is introduced then two rule-based synthesis techniques (formant synthesis and articulatory synthesis) are explained. After that concatenative synthesis is explored. Compared to rulebased synthesis, concatenative synthesis is simpler since there is no need to determine speech production rules. However, concatenative synthesis introduces the challenges of prosodic modification to speech units and resolving discontinuities at unit boundaries. Prosodic modification results in artifacts in the speech that make the speech sound unnatural. Unit selection synthesis, which is a kind of concatenative synthesis, solves this problem by storing numerous instances of each unit with varying prosodies. The unit that best matches the target prosody is selected and concatenated. Finally, hidden Markov model (HMM) synthesis is introduced.

downloadDownload free PDF View PDFchevron_right

A Simple Malay Speech Synthesizer Using Syllable Concatenation Approach

Nur Hana Samsudin

2004

Abstract—A Malay speech synthesizer system will be discussed. This paper will cover the available Malay speech synthesis system, the underlying structure of our system, brief description of crucial modules, general evaluation of the system, the proposed enhancement and future work of Malay text-to-speech system in Computer-Aided Translation Unit (UTMK). The objective is to highlight how our system works and how to improve its performance. We would also enlighten our future paradigm in the text-to-speech research at UTMK.

downloadDownload free PDF View PDFchevron_right

A survey on speech synthesis techniques in Indian languages

Satyananda Champati Rai

Multimedia Systems, 2020

The text to speech technology has achieved significant progress during the past decade and is an active area of research and development in providing different human-computer interactive systems. Even though a number of speech synthesis models are available for different languages focusing on the domain requirements with many motive applications, a source of information on current trends in Indian language speech synthesis is unavailable till date making it difficult for the beginners to initiate research for the development of TTS systems for the low-resourced languages. This paper provides a review of the contributions made by different researchers in the field of Indian language speech synthesis along with a study on the Indian language characteristics and the associated challenges in designing TTS systems. A set of available applications and tools results out of different projects undertaken by different organizations along with a set of possible future developments are also discussed to provide a single reference to an important strand of research in speech synthesis which may benefit anyone interested to initiate research in this area.

downloadDownload free PDF View PDFchevron_right

Di-phone-Based Concatenative Speech Synthesis Systems for Marathi Language

IOSR Journals

The main objective of this paper is to provide a comparison between two di-phone-based concatenative speech synthesis systems for Marathi language. In concatenative speech synthesis systems, speech is generated by joining small prerecorded speech units which are stored in the speech unit register. A di-phone is a speech unit that begins at the middle of one phoneme and extends to the middle of the following one. Di-phones are commonly used in concatenative text to speech (TTS) systems as they have the advantage of modeling co-articulation by including the transition to the next phone inside the unit itself. The first synthesizer in this comparison was implemented using the Festival TTS system and the other synthesizer uses the MARY TTS system. In this comparison, the differences between the two systems in handling some of the challenges of the Marathi language and the differences between the Festival TTS system and the MARY TTS system in the DSP modules are highlighted. Also, the results of applying the diagnostic rhyme test (DRT) on both of the synthesizers are illustrated.

downloadDownload free PDF View PDFchevron_right

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

References (2)

Douglas O'Shaughnessy (2000),
Shure SM58 Technical Manual

International Journal of Latest Technology in Engineering, Management & Applied Science -IJLTEMAS (www.ijltemas.in)

International Standards Publication -ISP, 2014

Speech is used to convey information, emotions, and feelings. Speech synthesis is the technique of converting given input text to synthetic speech. Speechsynthesis can be used to read text as in SMS, newspapers, site information etc. and can be used by blind people. Speech synthesis has been widely researched in last four decades. The quality andintelligibility of the synthetic speech produced is remarkably good for most of the applications. This report intends to review four majorly researched methods of speech synthesis viz. Articulatory, Concatenated, Formant, and Quasi-articulatory Synthesis. Mainly in this paper focus is given on Concatenative synthesismethod and some issues of this method are discussed. Articulatory Synthesis is based on human speech production model. The synthetic speech produced by this model is most natural, but it is also the most difficult method. Concatenative Synthesis uses prerecorded speech words, phrases and concatenates them to produce sound. It is the simplest method and yields high-quality speech but is limited by its memory requirement to store beforehand allpossible words, phrases to be produced. Formant Synthesis is based on the acoustic model of the human speech production system. It models the sound source and the resonance in the vocal tract, and is most common model used. Quasi-articulatory Synthesis is a hybrid of articulator acoustic model of speech production. Synthetic speech produced by this model sounds more natural and can be easily customized to meet different requirements of different applications and individualusers.

downloadDownload free PDF View PDFchevron_right

Concatenative Speech Synthesis: A Review

rubeena khan

International Journal of Computer Applications, 2016

The primary objective of this paper is to provide an overview of existing Concatenative Text-To-Speech synthesis techniques. Concatenative speech synthesis can be broadly categorized into three categories, Diphone Based, Corpus based and Hybrid. Diphone based speech synthesis relies on different signal processing techniques such as PSOLA, FD-PSOLA etc. These signal processing techniques introduce unwanted artifacts in the synthesized speech. The most popularly used method is the Unit selection synthesis which is a corpus based synthesis method. This method produces the most natural sounding synthetic speech.

downloadDownload free PDF View PDFchevron_right

A Hybrid Text-to-Speech System That Combines Concatenative and Statistical Synthesis Units

Slava Shechtman

IEEE Transactions on Audio, Speech, and Language Processing, 2011

Concatenative synthesis and statistical synthesis are the two main approaches to text-to-speech (TTS) synthesis. Concatenative TTS (CTTS) stores natural speech features segments, selected from a recorded speech database. Consequently, CTTS systems enable speech synthesis with natural quality. However, as the footprint of the stored data is reduced, desired segments are not always available in the stored data, and audible discontinuities may result. On the other hand, statistical TTS (STTS) systems, in spite of having a smaller footprint than CTTS, synthesize speech that is free of such discontinuities. Yet, in general, STTS produces lower quality speech than CTTS, in terms of naturalness, as it is often sounding muffled. The muffling effect is due to over-smoothing of model-generated speech features. In order to gain from the advantages of each of the two approaches, we propose in this work to combine CTTS and STTS into a hybrid TTS (HTTS) system. Each utterance representation in HTTS is constructed from natural segments and model generated segments in an interweaved fashion via a hybrid dynamic path algorithm. Reported listening tests demonstrate the validity of the proposed approach. Index Terms-Concatenative text-to-speech (CTTS), dynamic path, hybrid TTS, statistical TTS, TTS synthesis. T HERE are two main approaches for solving the text-to-speech (TTS) paradigm. The first one uses recorded speech feature segments, which may be words, phonemes, or even sub-phonemes. This speech generation method is called concatenative TTS (CTTS). In this approach, speech is generated by concatenating the best compatible segments according to certain concatenation rules. Speech generated by this approach inherently possesses natural quality. However, its quality depends on the size of the recorded database, as high-quality CTTS needs an extensive database. The main disadvantage of CTTS is the possible appearance of discontinuities at segment boundaries due to imperfect concatenation. The smaller the size of the stored database, the larger is the number of discontinuities that typically appear in the generated speech. Thus, in applications where storage and computational resources are limited, such as in mobile Manuscript

downloadDownload free PDF View PDFchevron_right

Text-to-Speech Synthesis Using Concatenative Approach

Oloko-Oba Mustapha O, SAMUEL OSAGIE

A text-to-speech (TTS) synthesizer is a computer based system that should be able to read any text aloud. Most text-to-speech synthesis lacks naturalness and intelligibility. This study is aimed at achieving the ease with which the output is understood and how closely the output sounds like human speech referred to as intelligibility and naturalness. In this research, an algorithm is developed in C-programming language capable of recognizing 50 isolated English vocabularies and produces the corresponding sound output using concatenative technique which employs natural human voice and produces the most natural-sounding speech. The vocabularies used as inputs were pre-recorded in .wav format and stored in the database. A user input the text which is searched through the database and the corresponding sound is played if a match is found otherwise an error message is given to check your spelling and try again

downloadDownload free PDF View PDFchevron_right

Di-phone-Based Concatenative Speech Synthesis System for Hindi

Sangramsing Kayte

This Research paper describes the first Text-to-Speech (TTS) system for the Hindi language, using the general speech synthesis architecture of Festival. The TTS is based on diphone concatenative synthesis, applying TD-PSOLA technique. The conversion process from input text into acoustic waveform is performed in a number of steps consisting of functional components. Procedures and functions for the steps and their components are discussed in detail. Finally, the quality of synthesized speech is assessed in terms of satisfactoriness and articulacy.

downloadDownload free PDF View PDFchevron_right

A Comparative Study of Different Text-to- Speech Synthesis Techniques

Helal Uddin Mullah

Speech synthesis is the artificial production of human speech. Attempts to control the quality of voice of synthesized speech have existed for more than a decade now. Several prototypes and fully operating systems also have been built based on different synthesis technique. This article reviews recent advances in research and development of speech synthesis with focus on one of the key approaches i.e. statistical parametric approach to speech synthesis based on HMM, so as to provide a technological perspective. In this approach, spectrum, excitation, and duration of speech are simultaneously modeled by context dependent HMMs, and speech waveforms are generated from the HMMs themselves. This paper aims to give an overview of what has been done in this field, summarize and compare the characteristics of various speech synthesis techniques used.

downloadDownload free PDF View PDFchevron_right

Towards a Better Construction of Sentence Level Output for Malay Speech Synthesizer

Nur Hana Samsudin

During the improvement of Malay Speech Synthesizer ver2 (MSS ver2), we focused on how the selection of target syllable utterance is to be concatenated. The selection is based on the best match of phonetic context similarity between target utterance and recorded utterance. However, this approach only improves the quality of synthesized speech at the word level. We thus proposed additional attribute to be added in the speech corpus which we believe will provide better information and hence able to enhance the quality of synthesized speech at the sentence level. Additional information that is attached to the speech sound is: the pitch contour of the speech segment, sentence type, position and adjacency of the syllable in the word (phonetic context), position of the word in a sentence and pitch's mean. We also add information of break, pauses and transition of sound in the annotation of the corpus. All these information will provide richer information of the speech corpus and at the same time, this information can be manipulated to construct a template for longer speech synthesis output. We will show what information is used for annotation and how we are going to use this information during selection to produce utterance using Praat as a synthesizer tool.

downloadDownload free PDF View PDFchevron_right

INTEGRATION OF RULE-BASED FORMANT SYNTHESIS AND WAVEFORM CONCATENATION: A HYBRID APPROACH TO TEXT-TO-SPEECH SYNTHESIS

Sue Hertz

This paper describes an approach to speech synthesis in which waveform fragments dynamically produced with a set of formant-based synthesis rules are concatenated with pre-stored natural speech waveform fragments to produce a synthetic utterance. While this hybrid approach was originally implemented as a tool for research into improved voice quality in formant-based synthesis, it has produced such good results that we now view it as a potentially viable and advantageous approach for a textto-speech product. Possible advantages of the approach include smaller speech databases for waveform concatenation, enhancement of certain speech cues for sub-optimal listening environments, and improved and more efficient unit selection/production. In addition, the approach has already proven its utility as a tool for research and development in both concatenative and formant-based synthesis.

downloadDownload free PDF View PDFchevron_right

The Main Principles of Text-to-Speech Synthesis System

K. Aida-zade

2010

Abstract—In this paper, the main principles of text-to-speech synthesis system ,are presented. Associated problems ,which ,arise when,developing ,speech ,synthesis system ,are described. Used approaches and their application in the speech synthesis systems for Azerbaijani language are shown. Keywords—synthesis of Azerbaijani language, morphemes, phonemes, sounds, sentence, speech synthesizer, intonation, accent, pronunciation.

downloadDownload free PDF View PDFchevron_right

Speech synthesis systems: disadvantages and limitations

Karolina Kuligowska

International Journal of Engineering & Technology

The present speech synthesis systems can be successfully used for a wide range of diverse purposes. However, there are serious and important limitations in using various synthesizers. Many of these problems can be identified and resolved. The aim of this paper is to present the current state of development of speech synthesis systems and to examine their drawbacks and limitations. The paper dis-cusses the current classification, construction and functioning of speech synthesis systems, which gives an insight into synthesizers implemented so far. The analysis of disadvantages and limitations of speech synthesis systems focuses on identification of weak points of these systems, namely: the impact of emotions and prosody, spontaneous speech in terms of naturalness and intelligibility, preprocessing and text analysis, problem of ambiguity, natural sounding, adaptation to the situation, variety of systems, sparsely spoken languages, speech synthesis for older people, and some other minor...

downloadDownload free PDF View PDFchevron_right

Thirukkural - A Text-to-Speech Synthesis System

Sign up for access to the world's latest research

AbstractAI

Related papers

References (2)

Related papers

Related topics

Abstract
AI