Speech Recognition Arabic

description7 papers

group41 followers

lightbulbAbout this topic

Speech Recognition Arabic is a subfield of computational linguistics and artificial intelligence focused on the automatic identification and processing of spoken Arabic language. It involves the development of algorithms and models that enable machines to convert spoken Arabic into text, facilitating human-computer interaction and enhancing accessibility in various applications.

lightbulbAbout this topic

Key research themes

1. What acoustic features and modeling approaches improve vowel and phoneme recognition accuracy in Arabic speech recognition?

This theme explores acoustic feature extraction methods and modeling techniques targeting the unique phonetic characteristics of Arabic vowels and phonemes, including their length, dialectal variations, and diacritic ambiguity. Improving the representation and classification of these units is crucial to enhancing overall Arabic ASR system accuracy.

Comparative Analysis of Arabic Vowels using Formants and an Automatic Speech Recognition System

by amir hussain

2021

Key finding: The study demonstrates that analyzing first and second formants alongside a Hidden Markov Model (HMM)-based recognizer for Modern Standard Arabic vowels facilitates understanding vowel similarities and differences. It reveals... Read more

articleView Paper downloadDownload

Efficient Feature Extraction Algorithms to Develop an Arabic Speech Recognition System

by Professor (Dr.) Ratnadeep R . Deshmukh

2022, Engineering, Technology & Applied Science Research

Key finding: This paper shows that Power-Normalized Cepstral Coefficients (PNCC) and Modified Group Delay Function (ModGDF) outperform the widely used Mel-Frequency Cepstral Coefficients (MFCC) in Arabic speech recognition tasks. Using... Read more

articleView Paper downloadDownload

Constructing accurate and robust HMM/GMM models for an Arabic speech recognition system

by Mohamed ELHADJ

2024, International Journal of Speech Technology

Key finding: Experimental results show that integrating complementary acoustic features such as voiced formants and pitch with conventional MFCCs in HMM/GMM systems significantly reduces error rates for Arabic ASR. The study emphasizes... Read more

articleView Paper downloadDownload

On Developing an Automatic Speech Recognition System for Standard Arabic

by Fadoua Drira and

2015

Key finding: Through systematic experiments varying frame windowing, acoustic parameter numbers from MFCC and PLP, acoustic modeling units, Gaussian mixtures, and Baum-Welch reestimations, this study achieves 94.02% phoneme recognition... Read more

articleView Paper downloadDownload

Study of acoustic parameters and models for Automatic Standard Arabic Speech Recognition

by Adel M. Alimi

2016

Key finding: The research empirically validates the suitability of MFCC and PLP for feature extraction in Arabic ASR, and employs statistical HMM-based modeling with appropriate acoustic units and grammar for Standard Arabic. It... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. How can acoustic and language model integration, dialect variability, and corpus development improve multi-dialect Arabic ASR performance?

This research area focuses on addressing challenges posed by Arabic's multiple dialects, dialectal phonetic and orthographic variations, the scarcity of large annotated corpora, and morphological richness. It investigates corpus gathering, normalization of dialectal variants, deep learning architectures, and language modeling strategies to build robust multi-dialect Arabic ASR systems.

Multi-Dialect Arabic Speech Recognition

by Abbas Ali

2024, 2020 International Joint Conference on Neural Networks (IJCNN)

Key finding: Developed a large multi-dialect annotated speech corpus and used a combined convolutional and recurrent deep neural network architecture trained end-to-end with a beam search decoder coupled with a tetra-gram language model.... Read more

articleView Paper downloadDownload

Investigation Amazigh speech recognition using CMU tools

by Hassan Satori

2015

Key finding: This study shows that the CMU Sphinx HMM-based toolkit can be effectively adapted to resource-poor languages similar to Arabic in phonetic complexity, such as Amazigh, achieving 92.89% recognition accuracy on digits speech.... Read more

articleView Paper downloadDownload

Arabic Speech Recognition: Advancement and Challenges

by haifa Alhasson

2024, IEEE access

Key finding: The review highlights that multi-dialect Arabic ASR development requires addressing language dependency and complex morphology through tailored architectures and extensive datasets. It analyzes recent advances in ML and deep... Read more

articleView Paper downloadDownload

Arabic Dialect Processing

by Mona Diab

2025, Similar Languages, Varieties, and Dialects

Key finding: This work underscores that dialectal Arabic is the primary form used in spontaneous speech and new media, necessitating dialect-specific resources and lexicons for effective ASR. It emphasizes the significant phonological,... Read more

articleView Paper downloadDownload

Speech Recognition System of Arabic Digits Based on A Telephony Arabic Corpus

by Yousef Alotaibi

2016, The 2008 International …

Key finding: By utilizing a telephony Arabic corpus of digits and applying HMM-based recognition techniques, this study demonstrates the importance of handling Arabic phoneme classes, syllable structures, and phoneme variations across... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How can phoneme duration modeling and visual speech features enhance recognition of Quranic Arabic and improve robustness in noisy or challenging environments?

This area investigates specialized phonetic phenomena such as phoneme lengthening (Medd) in Quranic recitation and the use of visual lip movement features to aid recognition, especially where audio input is noisy or limited. These methodological advances aim at improving phoneme classification accuracy in religious Arabic recitations and general speech recognition robustness leveraging visual cues.

Rule-Based Embedded HMMs Phoneme Classification to Improve Qur’anic Recitation Recognition

by Mohammed A. H. Ali

2024, Electronics

Key finding: Introduces a Rule-Based Phoneme Duration Algorithm integrated with HMMs that models the phoneme lengthening (Medd) specific to Quranic recitations, capturing phoneme duration features governed by Tajweed rules. This approach... Read more

articleView Paper downloadDownload

Lips Reading Spoken Arabic Word Based on The Geometric Shape Features of The Lip

by International Journal of Scientific Research in Science and Technology IJSRST

2023, International Journal of Scientific Research in Science and Technology

Key finding: Proposes a visual speech recognition method utilizing geometric features extracted from 20 lip landmarks to classify spoken Arabic words based on lip shape and movement without audio input. Experimental methodology emphasizes... Read more

articleView Paper downloadDownload

Arabic language learning assistance based on automatic speech recognition system

by Ayoub Maatallaoui and

2012, world-comp.org

Key finding: Develops a speaker-independent Arabic ASR system using CMU Sphinx with specific focus on pronunciation error detection for language learning applications, particularly in assessing phonetically challenging features such as... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Speech Recognition Arabic

The Baseline Speech Recognition System 2 . 1 Speech and

by Solomon Teferra

2024

This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to... more

descriptionView Paper arrow_downwardDownload

Cross Language Information Transfer Between Modern Standard Arabic and Its Dialects – a Framework for Automatic Speech Recognition System Language Model

by طيبة زكي

2024

descriptionView Paper arrow_downwardDownload

Arabic Dialects System using Hidden Markov Models (HMMs)

by Eman Jibril

2023, WSEAS TRANSACTIONS ON COMPUTERS

The Arabic language has many different dialects and it must be recognized before using the automatic speech recognition (ASR). On the other hand, it is observed in all Arab countries that the standard Arabic language is widely written and... more

descriptionView Paper arrow_downwardDownload

Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages

by Solomon Teferra

2023, Interspeech 2020

Development of Multilingual Automatic Speech Recognition (ASR) systems enables to share existing speech and text corpora among languages. We have conducted experiments on the development of multilingual Acoustic Models (AM) and Language... more

descriptionView Paper arrow_downwardDownload

The Baseline Speech Recognition System 2 . 1 Speech and

by Solomon Teferra Abate

2023

descriptionView Paper arrow_downwardDownload

Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon

by mohammad mohammadamini

2023, Cornell University - arXiv

In this paper, we introduce the first large vocabulary speech recognition system (LVSR) for the Central Kurdish language, named Jira. The Kurdish language is an Indo-European language spoken by more than 30 million people in several countries, but due to the lack of speech and text resources, there is no speech recognition system for this language. To fill this gap, we introduce the first speech corpus and pronunciation lexicon for the Kurdish language. Regarding speech corpus, we designed a sentence collection in which the ratio of di-phones in the collection resembles the real data of the Central Kurdish language. The designed sentences are uttered by 576 speakers in a controlled environment with noise-free microphones (called AsoSoft Speech-Office) and in Telegram social network environment using mobile phones (denoted as AsoSoft Speech-Crowdsourcing), resulted in 43.68 hours of speech. Besides, a test set including 11 different document topics is designed and recorded in two corresponding speech conditions (i.e., Office and Crowdsourcing). Furthermore, a 60K pronunciation lexicon is prepared in this research in which we faced several challenges and proposed solutions for them. The Kurdish language has several dialects and sub-dialects that results in many lexical variations. Our methods for script standardization of lexical variations and automatic pronunciation of the lexicon tokens are presented in detail. To setup the recognition engine, we used the Kaldi toolkit. A statistical tri-gram language model that is extracted from the AsoSoft text corpus is used in the system. Several standard recipes including HMM-based models (i.e., mono, tri1, tr2, tri2, tri3), SGMM, and DNN methods are used to generate the acoustic model. These methods are trained with AsoSoft Speech-Office and AsoSoft Speech-Crowdsourcing and a combination of them. The best performance achieved by the SGMM acoustic model which results in 13.9% of the average word error rate (on different document topics) and 4.9% for the general topic.

descriptionView Paper arrow_downwardDownload

Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks

by Mietta Lennes

2023, Language Resources and Evaluation

The Donate Speech campaign has so far succeeded in gathering approximately 3600 h of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the... more

Lahjoita puhetta: a large-scale corpus of spoken Finnish with... Fig. 1 The number of recordings received in each month during the campaign 2.1 Metadata complementing the speech corpus

Fig. 2 The distribution of the speaker metadata in the corpus. The “training set” includes both the “train transcribed” and “train untranscribed” described in Table 1. “N/A” means the user has not answered to the question about his or her background, or has given multiple contradicting answers

Fig. 3 The recording length distribution. The recording durations are pooled to 1-s bins to generate this figure

Fig.4 The distribution of word-level (top) and character-level (bottom) error rates per annotators on the transcribed dataset. Note Utterances with more than 100% errors were pooled together for this visualisa- tion. Note also that the transcribers’ ids of the second phase do not match to the first phase

Fig.5 The distribution of WERs in the test set w.r.t. the age and gender of the speaker

Fig.6 The distribution of WERs in the test set w.r.t. the dialect and gender of the speaker

Fig. 7 Metadata accuracy per class on the test set ~ “ the models using the original transcripts. The results for the topic classification task are given in Table 11. From the table, we can observe that the model that uses the original transcripts achieves slightly better results than the one using the ASR-generated transcripts on the multi-tran- scriber test set, whereas on the test set, they perform identically. Additionally, the models using only the transcripts achieve significantly better results than the model using the whole audio, even though the audio-only model was trained on far more data. When jointly using the audio and the transcript information, we can see that there is a small degradation in comparison to using only the transcripts. This could indicate that the audio does not provide any additional information that would help the model. Another thing to consider is that the audio encoder that we are using is quite small, so a bigger model might be necessary if we want to benefit more from the acoustic information. When we combined the audio and the ASR-generated transcripts, we observed only a small degradation in the performance, in compari- son to using the audio with the original transcripts. This could indicate that cer- tain keywords affect the topic classification and the ASR system is good at detecting them. Using this knowledge, in future experiments we can generate transcripts for the untranscribed part of the data and use them in addition to the audio, to train a big model that utilises audio and transcript information. From the results obtained on the model trained on the subset of the audio, we can see that there is a significant degradation in the results in comparison to the model that uses only the transcripts. This confirms that the textual information content is sufficiently dense for this task.

Fig.8 Metadata class distribution for the test sets Lahjoita puhetta: a large-scale corpus of spoken Finnish with... Generally, the models were able to learn the task relatively well, while still leaving some space for improvement, especially on the audio side.

Table 1 The sizes of the corpus and its subsets

reference order from the perspective of one of the annotators. Repeating this pro- ess for all transcriber companies gave us multiple rankings, and we tried to identify uutliers by aggregating these preference rankings. In case of an outlier, we could erify that its transcription is of lower quality than the others by manually check- ng the transcripts with the most differences to the other transcribers. During these malyses, we ignored the non-word symbols, as they were annotated with consider- ible discrepancies by different annotators.

Table 4 Sizes of the language models and their training corpora The number of n-grams refers to the numbers of unigrams, bigrams, trigrams and 4-grams summed together

Larger LMs were trained on external LM data, namely the WEBCON corpus and the DSPCON tran- scriptions, in addition to the AM training set (either 100 h or 1600 h) transcriptions. All HMM/DNN sys- tem LMs are subword-based 4-gram models. The Wav2Vec2 + CTC system uses a word-based 4-gram language model trained on the 1600 h LP transcripts and the external data Table5 Error rates of various ASR systems

The classes for age and gender are those specified in Fig. 2, includ- ing the N/A classes. For the themes, the “Media Skills” classes were combined as one class. For the dialects, we used the 21 original classes for these calculations

Table 8 Accuracy of the models on the gender classification task narked baseline models for gender, age, dialect, and topic classification. The mod els are built using a 5-layer TDNN with dilated connections, followe yy statistical pooling and two linear layers. This is similar to the x-vector mod 1s (Snyder et al., 2018). We will call this part audio encoder. For the dialect an opic classification tasks, besides the models trained on audio-only, we additionall rained mod els that utilise the available transcripts. We did that using an addition< ext encoder. In the text encoder, word embeddings are extracted using the FinBER’ nodel (Virtanen et al., 2019) and processed through a bi-directional long short-terr nemory (B LSTM) network (Hochreiter & Schmidhuber, 1997). In the last stage, th yutputs of t he audio and text encoders are concatenated and passed through a soft nax function which produces class probabilities. 7 es ee Lf. 4... ClO iC lg (ge ZZ tl Oa. 1... ..1... ...%a2h. Ahr Lila... ...*.

Lahjoita puhetta: a large-scale corpus of spoken Finnish with... I I I I I a The accuracy of the models is given in Table 10. Looking at the results, we ca observe that the model trained on all the audio performs better than the one traine on the audio and the available transcripts. This could indicate that the dialect info mation is predominant in the audio since the transcripts are not able to capture info mation such as pronunciation and accent. Additionally, we can observe that usin the ASR transcripts degrades the performance on the test set, but it improves slightly on the multi-transcriber test set, in comparison to using the original trat scripts. This could mean that the words affected by the dialect are also difficult fc the ASR model, resulting in incorrect transcriptions. Further, the audio subset mod performs better than its counterpart that additionally uses the transcripts. This coul indicate that instead of providing additional information, the transcripts introduc noise to the model. i, a, a i, ee oe, a: ae: er i a i: |, ne ae i fe

descriptionView Paper arrow_downwardDownload

Deep Investigation of the Recent Advances in Dialectal Arabic Speech Recognition

by Hamzah Alsayadi

2022, IEEE Access

Speech recognition systems play an important role in human-machine interactions. Many systems exist for Arabic speech, however, there are limited systems for dialectal Arabic speech. The Arabic language comprises many properties, some of... more

descriptionView Paper arrow_downwardDownload

Activity insecticide of etanolic extract of plants on Spodoptera frugiperda (JE Smith)(Lepidoptera: Noctuidae)

by Fernanda Rodrigues Garcez

2022

descriptionView Paper arrow_downwardDownload

Improved Speech Recognition Processes using Hybrid Genetic Vector Quantizer

by Sakshi Choudhary

2022

Speech recognition basically means talking to a computer, having it recognize what Speakers are saying. Speech is common and efficient form of communication method for people to interact with each other. The person would like to interact... more

descriptionView Paper arrow_downwardDownload

Multilingual Graphemic Hybrid ASR with Massive Data Augmentation

by kritika singh

2022

Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations. In this work... more

descriptionView Paper arrow_downwardDownload

Deep Investigation of the Recent Advances in Dialectal Arabic Speech Recognition

by Abdelaziz Abdelhamid

2022, IEEE Access

descriptionView Paper arrow_downwardDownload

Domain Adaptation of End-to-end Speech Recognition in Low-Resource Settings

by Brian Mak

2022, 2018 IEEE Spoken Language Technology Workshop (SLT)

End-to-end automatic speech recognition (ASR) has simplified the traditional ASR system building pipeline by eliminating the need to have multiple components and also the requirement for expert linguistic knowledge for creating... more

descriptionView Paper arrow_downwardDownload

Deep Investigation of the Recent Advances in Dialectal Arabic Speech Recognition

by Islam Hegazy

2022, IEEE Access

descriptionView Paper arrow_downwardDownload

Investigation on N-Gram Approximated RNNLMs for Recognition of Morphologically Rich Speech

by Tibor Fegyó

2022, Statistical Language and Speech Processing

Recognition of Hungarian conversational telephone speech is challenging due to the informal style and morphological richness of the language. Recurrent Neural Network Language Model (RNNLM) can provide remedy for the high perplexity of... more

descriptionView Paper arrow_downwardDownload

Using ASR Methods for OCR

by Paola Garcia

2022, 2019 International Conference on Document Analysis and Recognition (ICDAR)

Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text... more

descriptionView Paper arrow_downwardDownload

Indigenuous Vocabulary Reformulation for Continuousyorùbá Speech Recognition In M-Commerce Using Acoustic Nudging-Based Gaussian Mixture Model

by Felix Chidozie

2022

One of the current research areas is speech recognition by aiding in the recognition of speech signals through computer applications. In this research paper, Acoustic Nudging, (AN) Model is used in re-formulating the persistence automatic... more

descriptionView Paper arrow_downwardDownload

Foreword.: CCE (University of Sussex) and its three Sussex River Ouse Projects: teaching, learning and research

by David Rudling

2022

We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments... more

descriptionView Paper arrow_downwardDownload

Indigenuous Vocabulary Reformulation for Continuousyorùbá Speech Recognition In M-Commerce Using Acoustic Nudging-Based Gaussian Mixture Model

by isaac odun-ayo

2022

descriptionView Paper arrow_downwardDownload

Enhancing Arabic Phoneme Recognizer using Duration Modeling Techniques

by Mostafa Belkasmi

2022

in some languages like Classical Arabic (The language of the Holy Quran), phoneme duration is considered as a distinguishing cue in Quranic phonology. Phonological variation of phonemes occurrences contributes to an inaccurate... more

descriptionView Paper arrow_downwardDownload

Morpheme-based language modeling for amharic speech recognition

by Solomon Teferra Abate

2022

descriptionView Paper arrow_downwardDownload

Multilingual Acoustic and Language Modeling for Ethio-Semitic Languages

by Solomon Teferra Abate

2022, Interspeech 2020

descriptionView Paper arrow_downwardDownload

Morpheme-Based and Factored Language Modeling for Amharic Speech Recognition

by Solomon Teferra Abate

2022, Human Language Technology. Challenges for Computer Science and Linguistics

descriptionView Paper arrow_downwardDownload

Morpheme-Based and Factored Language Modeling for Amharic Speech Recognition

by Solomon Teferra Abate

2022, Human Language Technology. …

descriptionView Paper arrow_downwardDownload

Indigenuous Vocabulary Reformulation for Continuousyorùbá Speech Recognition In M-Commerce Using Acoustic Nudging-Based Gaussian Mixture Model

by Prof Ambrose Azeta

2022

descriptionView Paper arrow_downwardDownload

Online Incremental Learning for Speaker-Adaptive Language Models

by Chi-Chih Hu

2021, Interspeech 2018

Voice control is a prominent interaction method on personal computing devices. While automatic speech recognition (ASR) systems are readily applicable for large audiences, there is room for further adaptation at the edge, ie. locally on... more

descriptionView Paper arrow_downwardDownload

Implementation of a Radiology Speech Recognition System for Estonian Using Open Source Software

by Einar Meister

2021, Interspeech 2017

Speech recognition has become increasingly popular in radiology reporting in the last decade. However, developing a speech recognition system for a new language in a highly specific domain requires a lot of resources, expert knowledge and... more

Figure 1: Word error rates corresponding to individual radiolo- gists in different system development stages (marked with dots) and the corresponding average WERs (marked with bars). Sys- tem IDs correspond to those in Table 4.

Table 4: Word error rates after different system development stages.

Table 5: Word error rates of the final system on different radiol- ogy modalities. The second and third columns show the number of test reports and the average number of words per report.

descriptionView Paper arrow_downwardDownload

Speech Recognition in Human-Computer Interactive Control

by An Tran Phu

2021, Journal of Automation and Control Engineering

This paper gives an introduction of speech recognition systems for human-computer interaction using Vietnamese language. First, the paper investigates the two most common speech recognition toolkits currently used, HTK and Sphinx, and... more

descriptionView Paper arrow_downwardDownload

Endoscopic procedures control using speech recognition

by Victor Alves

2021

In this paper it is presented a solution for replacing the current endoscopic exams control mechanisms. This kind of exams require the gastroenterologist to perform a complex procedure, using both hands simultaneously, to manipulate the... more

descriptionView Paper arrow_downwardDownload

Generation of Arabic phonetic dictionaries for speech recognition

by Moustafa Elshafei

2021

Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary... more

descriptionView Paper arrow_downwardDownload

Speech Recognition in Human-Computer Interactive Control

by Trương Vũ

2021, Journal of Automation and Control Engineering

descriptionView Paper arrow_downwardDownload

Speech Recognition in Human-Computer Interactive Control

by Trương Vũ

2021, Journal of Automation and Control Engineering

descriptionView Paper arrow_downwardDownload

Endoscopic procedures control using speech recognition

by José Neves

2021

Fig. 1 Workflow for a gastroenterology medical appointment

Fig. 3 MlVcontrol model training procedure Fig. 2 MiVcontrol global architecture

Table 1 Effect of the number of tied states on the WER The effect of the number of Gaussian mixture distributions on the error rate is shown on Table 2. The effect of the number of tied states in the HMM is shown on Table 1.

descriptionView Paper arrow_downwardDownload

Lightly supervised alignment of subtitles on multi-genre broadcasts

by Bilal Khaliq

2021, Multimedia Tools and Applications

This paper describes a system for performing alignment of subtitles to audio on multigenre broadcasts using a lightly supervised approach. Accurate alignment of subtitles plays a substantial role in the daily work of media companies and... more

descriptionView Paper arrow_downwardDownload

Development of Speech Recognition Systems in Emergency Call Centers

by Samir Rustamov

2021, Symmetry

In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the... more

descriptionView Paper arrow_downwardDownload

Sphinx-4: A flexible open source framework for speech recognition

by Joe K Wölfel

2021, … , Inc. Mountain View, …

Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the... more

descriptionView Paper arrow_downwardDownload

Generation of arabic phonetic dictionaries for speech recognition

by Mohamed Fadeel

2021, 2008 International Conference on Innovations in Information Technology

descriptionView Paper arrow_downwardDownload

Recurrent Neural Network based Language Modeling for Punjabi ASR

by Vaibhav Kumar

2020, SSRG International Journal of Computer Science and Engineering

Deep Learning approaches have been widely known to perform better than statistical approaches. This is the first effort to investigate Recurrent Neural Network-based modeling for Punjabi speech corpus. We propose the Lattice Rescoring... more

descriptionView Paper arrow_downwardDownload

Arabic Speech Recognition: Challenges and State of the Art

by Abdullah Moussa

2019

The Arabic language has many features such as the phonology and the syntax that make it an easy language for developing automatic speech recognition systems. Many standard techniques for acoustic and language modeling such as context... more

descriptionView Paper arrow_downwardDownload

A Noble Approach for Recognizing Bangla Real Number Automatically Using CMU Sphinx4

by Md Saiful Islam and

2016

— Speech recognition is widely researched topic around the world. It is a process of conversion of speech to text. Many scientists and researchers are busy with doing works to increase the performance of speech recognition systems. Most... more

descriptionView Paper arrow_downwardDownload

Developing a hybrid language model for open vocabulary automatic speech recognition in a lecture speech task

by Richard Rose

2016, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA)

This paper addresses the problem of open vocabulary automatic speech recognition (ASR) using hybrid statistical language models (LMs). Hybrid LMs differ from closed vocabulary LMs in that the word level lexicon is augmented with an... more

descriptionView Paper arrow_downwardDownload

Sphinx 4 Speech Recognition in ATC

by IJAERS Journal

2016

—Speech Recognition plays a very important role in day to day life. Speech Recognition is widely used and addicted by this world as it allows users to communicate with computers by recognizing their spoken language. Communication with... more

Fig.3: Corpus collection for Speaker Dependent SR

Fig.4: Corpus collection for Speaker Independent SR

descriptionView Paper arrow_downwardDownload

Sphinx-4: A flexible open source framework for speech recognition

by Peter Wolf

2015, … , Inc. Mountain View, …

descriptionView Paper arrow_downwardDownload

Arabic Speech Recognition System Based on CMUSphinx

by Hassan Satori

2015

In this paper we present the creation of an Arabic version of Automated Speech Recognition System (ASR). This system is based on the open source Sphinx-4, from the Carnegie Mellon University. Which is a speech recognition system based on... more

Fig. 1: Sphinx-4 Architecture, the main blocks are the FrontEnd, the Decoder, and the Linguist.

Fig, 2: Waveforms for digits 4 (44:)!) speaker 2 trial 2. generated by the open source waveform editor wavesurfer [22].

Hence, the corpus consists of 5 repetitions of every digit produced by each speaker. Depending on this, the corpus consists of 300 tokens. During the recording session, each utterance was played back to ensure that the entire digit was included in the recorded signal. All the 300 (10 digits - 5 repetitions - 6 speakers) tokens were used for training and testing phases. Table.1 shows some of the recording system parameters.

Table 2: Phonemes symbols, used in the training of HMMs. collection of feature vectors. The circularity in this training process is resolved using the iterative Baum- Welch or forward-backward training algorithm [23].

Table 3: Excerpt from the dictionary of Arabic digits application. The following table shows an excerpt from the jictionary used for the training and recognition phases.

In order to evaluate the performances of the application, we performed some experiments on different individuals each one of them was asked to utter 10 Arabic digits (as described before in sec. 3.1). We recorded the number of words that were correctly recognized, and then a mean recognition ratio for each tester was calculated (see table 4 and 5). Table 4: Test results for individual speakers, where M means Man and W woman.

descriptionView Paper arrow_downwardDownload

Comparison of automatic speech recognizer SPHINX 3.6 and SPHINX 4.0 for creating systems in Slovak language

by Gregor Rozinaj

2014

The goal of this article is to provide and present information about the training procedure SpinxTrain and its eligible modifications to get accurate and robust speech recognition in a mobile GSM environment. Some modifications are based on effective preprocessing of input data in combination with the optimal setting of the number of states per model, through the adjustment of the number of tied states or number of Gaussian mixtures. Another source of increased recognition rate is the 'optimal' setting of the speech decoder. As it is a non-linear, mathematically not well tractable task containing both real and integer values, methods of evolution strategies can be successfully used (an 18.6% improvement in WER was observed compared to the original setting). All experiments and results were obtained for the Slovak speech database Mobildat, which contains recordings of 1100 speakers. The Sphinx4 recognition system was used for evaluation of the trained model. Biographical notes: Juraj Vojtko, born in 1981, received MSc in telecommunications from the Slovak University of Technology in Bratislava, Slovakia in 2005. Since 2009 he has occupied an assistant professor position at the Institute of Telecommunications at the Faculty of Electrical Engineering and Information Technology of the Slovak University of Technology in Bratislava. He has also worked as developer in commercial segment in the area of communication and information systems since 2002. The field of his research focused on speech processing specifically speech recognition and speaker identification and verification. Juraj Kačur, born in Bratislava in 1976. Master of Science degree obtained in the year 2000 and PhD in 2005 at the Faculty of Electrical Engineering and Information Technology of the Slovak university of technology (FEI STU) Bratislava. Since 2001, he occupies an assistant professor position at the institute of telecommunication at FEI STU Bratislava. Between years 2000 and 2001, he was with the Slovak Academy of Science, department of speech analysis and synthesis where he participated on several projects. The field of his research activities includes: signal processing, speech processing, speech recognition, speech detection, speaker identification, High order statistic, Wavelet transform, machine learning, ANN and HMM. Gregor Rozinaj (M'97) received MSc and PhD in telecommunications from Slovak University of Technology, Bratislava, Slovakia in 1981 and 1990, respectively. He has been a lecturer at the Institute of Telecommunications of the Slovak University of Technology since 1981. From 1992-1994, he worked on the research project devoted to speech recognition at Alcatel Research Center in Stuttgart, Germany. From 1994-1996, he was employed as a researcher at the University of Stuttgart, Germany working on a research project for automatic ship control. Since 1997, he has been a Head of the DSP group at the Institute of Telecommunications of the Slovak University of Technology, Bratislava. Since 1998, he has been an Associate Professor at the same institute. He is an author of 3 US and European patents on digital speech recognition and 1 Czechoslovak patent on fast algorithms for DSP.

descriptionView Paper arrow_downwardDownload

Generation of Arabic phonetic dictionaries for speech recognition

by Husni Al-Muhtaseb

2013

Table 2. Summary of the performance of the AASR system for different phone/rules test cases. 6. Conclusion

descriptionView Paper arrow_downwardDownload

Novel speech recognition models for Arabic

by JI Svo

2013, Johns-Hopkins …

descriptionView Paper arrow_downwardDownload

Design of the CMU Sphinx-4 decoder

by Bhiksha Raj

2013

Abstract The decoder of the sphinx-4 speech recognition system incorporates several new design strategies which have not been used earlier in conventional decoders of HMM-based large vocabulary speech recognition systems.

descriptionView Paper arrow_downwardDownload

Sphinx-4: A flexible open source framework for speech recognition

by Bhiksha Raj

2013

Abstract Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged... more

Fig. 1. Sphinx-4 Decoder Framework. The main blocks are the FrontEnd, the Decoder, and the Linguist. Supporting blocks include the ConfigurationManager and the Tools blocks. The communication between the blocks, as well as communication with an application, is depicted. ten ded to hardwi re their i mplemen tations to a Because they were focused on such fundamental core theories, the creators of these systems high degree. For example, the predecessor Sphinx systems restrict the order of the HMMs to a constant value and also fix the unit context to a single left and right co ten systems were di eb, re SE C Lene | ae ee See eee memes | ficult to ntext. Sphinx-3 eli specialization on large N- ded to be deeply entangled wi minated sup Gram models. Furtl ith the rest of modify for experime port for context free grammars (CFGs) due to the hermore, the decoding strategy of these systems the system. As a result of these constraints, the nts in other areas. ee eee Ee eer, ee ee ee OU TR oe

Fig. 2. Sphinx-4 FrontEnd. The FrontEnd comprises one or more parallel chains of communicating DataProcessors

Fig. 3. Example SearchGraph. The SearchGraph is a directed graph composed of optionally emitting SearchStates and SearchStateA rcs with transition probabilities. Each state in the graph can represent components from the LanguageModel (words in rectangles), Dictionary (sub-word units in dark circles) or AcousticModel (HMMs).

descriptionView Paper arrow_downwardDownload

Automatic learning of language model structure

by Kevin Duh

2013

Abstract Statistical language modeling remains a challenging task, in particular for morphologically rich languages. Recently, new approaches based on factored language models have been developed to address this problem. These models... more

descriptionView Paper arrow_downwardDownload

Speech Recognition Arabic

Key research themes

1. What acoustic features and modeling approaches improve vowel and phoneme recognition accuracy in Arabic speech recognition?

2. How can acoustic and language model integration, dialect variability, and corpus development improve multi-dialect Arabic ASR performance?

3. How can phoneme duration modeling and visual speech features enhance recognition of Quranic Arabic and improve robustness in noisy or challenging environments?

Related Topics

All papers in Speech Recognition Arabic