Speaker Independent Urdu Speech Recognition Using HMM
2010, Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-642-13881-2_14…
2 pages
1 file
Sign up for access to the world's latest research
Abstract
Automatic Speech Recognition (ASR) is one of the advanced fields of Natural Language Processing (NLP). Recent past has witnessed valuable research activities in ASR in English, European and East Asian languages. But unfortunately South Asian Languages in general and "Urdu" in particular have received very less attention. In this paper we present an approach to develop an ASR system for Urdu language. The proposed system is based on an open source speech recognition framework called Sphinx4 which uses statistical based approach (HMM: Hidden Markov Model) for developing ASR system. We present a Speaker Independent ASR system for small sized vocabulary, i.e. fifty two isolated most spoken Urdu words and suggest that this research work will form the basis to develop medium and large size vocabulary Urdu speech recognition system.
Related papers
2009 Oriental COCOSDA International Conference on Speech Database and Assessments, 2009
Center for Research in Urdu Language Processing (CRULP; www .crulp.org) at NUCES is currently working on a project entitled Telephone-based Speech Interfaces for Access to Information by Non-literate Users in collaboration with Carnegie Mellon University. The goal of this project is to investigate the use of speech interfaces for users to access online health related information in Pakistan. This will be achieved by developing a telephone based dialogue system consisting of an Urdu Speech Recognition system and a Text to Speech system that can interact with the health workers to answer their queries. One key component of this system a Large Vocabulary Automatic Speech Recognition (LVASR) system for Urdu. This system requires the construction of a phonetically rich and balanced corpus for recognition of continuous and spontaneous speech in Urdu. Once the training corpus is recorded, it has to be labeled. The system will be based on Hidden Markov Models, using Sphinx 3 [3] trainer and Sphinx 4 ([4], [5]) decoder. This paper describes the process employed in the design and development of the phonetically rich Urdu speech corpus , the initial step in the development of the Urdu LVASR. The next section briefly reviews similar work done for other languages and the phonetic characteristics of Urdu. Sections 3 and 4 and 6 describe
Communications in Computer and Information Science, 2009
This paper presents a speech processing and recognition system for individually spoken Urdu language words. The speech feature extraction was based on a dataset of 150 different samples collected from 15 different speakers. The data was pre-processed using normalization and by transformation into frequency domain by (discrete Fourier transform). The speech recognition feed-forward neural models were developed in MATLAB. The models exhibited reasonably high training and testing accuracies. Details of MATLAB implementation are included in the paper for use by other researchers in this field. Our ongoing work involves use of linear predictive coding and cepstrum analysis for alternative neural models. Potential applications of the proposed system include telecommunications, multi-media, and voice-activated tele-customer services.
Hindi is very complex language with large number of phonemes and being used with various ascents in different regions in India. In this manuscript, speaker dependent and independent isolated Hindi word recognizers using the Hidden Markov Model (HMM) is implemented, under noisy environment. For this study, a set of 10 Hindi names has been chosen as a test set for which the training and testing is performed. The scheme instigated here implements the Mel Frequency Cepstral Coefficients (MFCC) in order to compute the acoustic features of the speech signal. Then, K-means algorithm is used for the codebook generation by performing clustering over the obtained feature space. Baum Welch algorithm is used for re-estimating the parameters, and finally for deciding the recognized Hindi word whose model likelihood is highest, Viterbi algorithm has been implemented; for the given HMM. This work resulted in successful recognition with 98.6% recognition rate for speaker dependent recognition, for total of 10 speakers (6 male, 4 female) and 97.5% for speaker independent isolated word recognizer for 10 speakers (male).
2012 2nd International Conference on Power Control and Embedded Systems, 2012
In this paper three schemes based on the Hidden Markov Model for recognition of isolated words in Hindi Language speech are discussed; namely speaker dependent, multi speaker and speaker independent. For the study a set of 10 Hindi words is chosen, for which the training followed by testing is performed. The recogniser is built over three basic building blocks namely Feature extraction, Training and Recognition (Testing). The scheme proposed here implements the Mel Frequency Cepstral Coefficients (MFCC) in order to compute the spectral features of the speech signal. Then, K-means algorithm is used to form the codebook by performing clustering over the obtained feature vectors. Recognition of a spoken Hindi word is carried out by first driving its features, and then deciding in favour of the Hindi word whose model likelihood is highest, by implementing the Viterbi algorithm for the given HMM. The recognition rate for speaker dependent isolated word recogniser for total of 10 speakers (7 male, 3 female) is 99% whereas for multi speaker it is 98% (10 male) and for speaker independent (10 male) it is 97.5%. Experiments are carried out to develop a approach towards advancement in this field specifically for Hindi.
This work presents Hidden Markov Model (HMM) based speech synthesis for the Urdu language. This is a widely spoken language across different regions in Asia. For example, Urdu is the official language of Pakistan and one of the national languages of India. Unfortunately, there is no corpus of Urdu currently publicly available that to our knowledge is appropriate for HMM-based speech synthesis purpose. We overcame this problem by recording an Urdu speech database with word and phone labels obtained using manual and semi-automatic annotation approaches. In summary, the objective of this work is to develop an HMM-based Urdu speech synthesiser from scratch by trying to use publicly available text processing tools for this language and by developing the necessary processing components.
2017
Natural language processing enables computer and machines to understand and speak human languages. Speech recognition is a process in which computer understand the human language and processes further instructions as per recognition of the human language. The human language varies so the machine or computer needs entirely different algorithms as the human languages differ in various aspects, such as sounds, phonemes, words, meanings and much more. Understanding human language is a challenging job and for this purpose Hidden Markov Models are used commonly as they possess promising results in understanding human language. A survey of various researches employing Hidden Markov models is presented to highlight the importance of HMM in the process of speech recognition.
International Journal of Innovative Technology and Exploring Engineering, 2019
The present manuscript focuses on building automatic speech recognition (ASR) system for Marathi language (M-ASR) using Hidden Markov Model Toolkit (HTK). The M-ASR system gives the detail about experimentation and implementation using the HTK Toolkit. In this work total 106 speaker independent Marathi isolated words were recognized. These unique Marathi words are used to train and evaluate M-ASR system. The speech corpus (database) is created by us using isolated Marathi words uttered with mixed gender people. The system uses Mel Frequency Cepstral Coefficient (MFCC) for the purpose of extracting features using Gaussian mixture model (GMM). Viterbi algorithm based on token passing is used for decoding to recognize unknown utterances. The proposed M-ASR system is speaker independent. The proposed system has reported 96.23% word level recognition accuracy.
2017
Speech Recognition is the understanding human words by computer that was spoken by the human. These words may be the human language and changing the human language will demand different challenges for the different language which means the algorithms designed for English speech recognition cannot be employed to recognize another language such as Sindhi. It requires entirely new and separate algorithms to understand spoken words for Sindhi language. In this regard, every language and script pose different challenges related to script. This paper introduces a study related to speech recognition systems available in various language specially related to Sindhi language. An emphasis has been given to architecture of automatic speech recognition system, various challenges posed by the scripts with special attention to Sindhi and its related languages.
Int. J. Speech Technol., 2020
In this paper, we present our Amazigh automatic speech recognition system. Its realization is constructed with context-independent phonetic Hidden Markov Models. Many choices are made on this system, such as the number of states of the models, the type of emission probability densities associated with the states, and the representation of the signal by cepstral coefficients. The results of recognition of our system place it at a level of height performance comparable to that achieved by Markovian automatic speech recognition systems. Our system is designed to recognize 43 distinct isolated Amazigh words (33 letters and 10 digits). The recognition rate is then calculated for each digit and letter. The overall accuracy and word recognition rate for the whole database achieved 91.31% after extensive testing and change of the recognition parameters. The results obtained in this work are improved in association with our previous work concerning Amazigh spoken digits and letters automatic...
— The availability of standard speech database is of paramount importance in the automatic speech recognition (ASR) research in the context of providing a baseline for comparing the performance of automatic speech recognition approaches. This paper presents the development of a Medium-Vocabulary Speech Corpus for Pashto language and development of Pashto ASR system by using the corpus. The vocabulary encompasses 161 isolated words of Pashto language, consisting of most frequently used words of Pashto language, names of the days of the week and digits from 0 to 25. The words were uttered by 50 speakers of different ages and genders, including both native and non-native speakers of Pashto language. Recording of the corpus was performed in a noise free office environment. The Corpus developed is then used for the development of an automatic speech recognition system for Pashto language.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.