Academia.eduAcademia.edu

Speech Recognition Arabic

description7 papers
group41 followers
lightbulbAbout this topic
Speech Recognition Arabic is a subfield of computational linguistics and artificial intelligence focused on the automatic identification and processing of spoken Arabic language. It involves the development of algorithms and models that enable machines to convert spoken Arabic into text, facilitating human-computer interaction and enhancing accessibility in various applications.
lightbulbAbout this topic
Speech Recognition Arabic is a subfield of computational linguistics and artificial intelligence focused on the automatic identification and processing of spoken Arabic language. It involves the development of algorithms and models that enable machines to convert spoken Arabic into text, facilitating human-computer interaction and enhancing accessibility in various applications.

Key research themes

1. What acoustic features and modeling approaches improve vowel and phoneme recognition accuracy in Arabic speech recognition?

This theme explores acoustic feature extraction methods and modeling techniques targeting the unique phonetic characteristics of Arabic vowels and phonemes, including their length, dialectal variations, and diacritic ambiguity. Improving the representation and classification of these units is crucial to enhancing overall Arabic ASR system accuracy.

Key finding: The study demonstrates that analyzing first and second formants alongside a Hidden Markov Model (HMM)-based recognizer for Modern Standard Arabic vowels facilitates understanding vowel similarities and differences. It reveals... Read more
Key finding: This paper shows that Power-Normalized Cepstral Coefficients (PNCC) and Modified Group Delay Function (ModGDF) outperform the widely used Mel-Frequency Cepstral Coefficients (MFCC) in Arabic speech recognition tasks. Using... Read more
Key finding: Experimental results show that integrating complementary acoustic features such as voiced formants and pitch with conventional MFCCs in HMM/GMM systems significantly reduces error rates for Arabic ASR. The study emphasizes... Read more
by Fadoua Drira and 
1 more
Key finding: Through systematic experiments varying frame windowing, acoustic parameter numbers from MFCC and PLP, acoustic modeling units, Gaussian mixtures, and Baum-Welch reestimations, this study achieves 94.02% phoneme recognition... Read more
Key finding: The research empirically validates the suitability of MFCC and PLP for feature extraction in Arabic ASR, and employs statistical HMM-based modeling with appropriate acoustic units and grammar for Standard Arabic. It... Read more

2. How can acoustic and language model integration, dialect variability, and corpus development improve multi-dialect Arabic ASR performance?

This research area focuses on addressing challenges posed by Arabic's multiple dialects, dialectal phonetic and orthographic variations, the scarcity of large annotated corpora, and morphological richness. It investigates corpus gathering, normalization of dialectal variants, deep learning architectures, and language modeling strategies to build robust multi-dialect Arabic ASR systems.

Key finding: Developed a large multi-dialect annotated speech corpus and used a combined convolutional and recurrent deep neural network architecture trained end-to-end with a beam search decoder coupled with a tetra-gram language model.... Read more
Key finding: This study shows that the CMU Sphinx HMM-based toolkit can be effectively adapted to resource-poor languages similar to Arabic in phonetic complexity, such as Amazigh, achieving 92.89% recognition accuracy on digits speech.... Read more
Key finding: The review highlights that multi-dialect Arabic ASR development requires addressing language dependency and complex morphology through tailored architectures and extensive datasets. It analyzes recent advances in ML and deep... Read more
Key finding: This work underscores that dialectal Arabic is the primary form used in spontaneous speech and new media, necessitating dialect-specific resources and lexicons for effective ASR. It emphasizes the significant phonological,... Read more
Key finding: By utilizing a telephony Arabic corpus of digits and applying HMM-based recognition techniques, this study demonstrates the importance of handling Arabic phoneme classes, syllable structures, and phoneme variations across... Read more

3. How can phoneme duration modeling and visual speech features enhance recognition of Quranic Arabic and improve robustness in noisy or challenging environments?

This area investigates specialized phonetic phenomena such as phoneme lengthening (Medd) in Quranic recitation and the use of visual lip movement features to aid recognition, especially where audio input is noisy or limited. These methodological advances aim at improving phoneme classification accuracy in religious Arabic recitations and general speech recognition robustness leveraging visual cues.

Key finding: Introduces a Rule-Based Phoneme Duration Algorithm integrated with HMMs that models the phoneme lengthening (Medd) specific to Quranic recitations, capturing phoneme duration features governed by Tajweed rules. This approach... Read more
Key finding: Proposes a visual speech recognition method utilizing geometric features extracted from 20 lip landmarks to classify spoken Arabic words based on lip shape and movement without audio input. Experimental methodology emphasizes... Read more
Key finding: Develops a speaker-independent Arabic ASR system using CMU Sphinx with specific focus on pronunciation error detection for language learning applications, particularly in assessing phonetically challenging features such as... Read more

All papers in Speech Recognition Arabic

This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to... more
The Arabic language has many different dialects and it must be recognized before using the automatic speech recognition (ASR). On the other hand, it is observed in all Arab countries that the standard Arabic language is widely written and... more
Development of Multilingual Automatic Speech Recognition (ASR) systems enables to share existing speech and text corpora among languages. We have conducted experiments on the development of multilingual Acoustic Models (AM) and Language... more
This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to... more
In this paper, we introduce the first large vocabulary speech recognition system (LVSR) for the Central Kurdish language, named Jira. The Kurdish language is an Indo-European language spoken by more than 30 million people in several... more
The Donate Speech campaign has so far succeeded in gathering approximately 3600 h of ordinary, colloquial Finnish speech into the Lahjoita puhetta (Donate Speech) corpus. The corpus includes over twenty thousand speakers from all the... more
Speech recognition systems play an important role in human-machine interactions. Many systems exist for Arabic speech, however, there are limited systems for dialectal Arabic speech. The Arabic language comprises many properties, some of... more
Speech recognition basically means talking to a computer, having it recognize what Speakers are saying. Speech is common and efficient form of communication method for people to interact with each other. The person would like to interact... more
Towards developing high-performing ASR for low-resource languages, approaches to address the lack of resources are to make use of data from multiple languages, and to augment the training data by creating acoustic variations. In this work... more
Speech recognition systems play an important role in human-machine interactions. Many systems exist for Arabic speech, however, there are limited systems for dialectal Arabic speech. The Arabic language comprises many properties, some of... more
End-to-end automatic speech recognition (ASR) has simplified the traditional ASR system building pipeline by eliminating the need to have multiple components and also the requirement for expert linguistic knowledge for creating... more
Speech recognition systems play an important role in human-machine interactions. Many systems exist for Arabic speech, however, there are limited systems for dialectal Arabic speech. The Arabic language comprises many properties, some of... more
Recognition of Hungarian conversational telephone speech is challenging due to the informal style and morphological richness of the language. Recurrent Neural Network Language Model (RNNLM) can provide remedy for the high perplexity of... more
Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text... more
One of the current research areas is speech recognition by aiding in the recognition of speech signals through computer applications. In this research paper, Acoustic Nudging, (AN) Model is used in re-formulating the persistence automatic... more
We propose a method to extend a phoneme set by using a large amount of broadcast data to improve the performance of Korean spontaneous speech recognition. In the proposed method, we first extract variable-length phoneme-level segments... more
One of the current research areas is speech recognition by aiding in the recognition of speech signals through computer applications. In this research paper, Acoustic Nudging, (AN) Model is used in re-formulating the persistence automatic... more
in some languages like Classical Arabic (The language of the Holy Quran), phoneme duration is considered as a distinguishing cue in Quranic phonology. Phonological variation of phonemes occurrences contributes to an inaccurate... more
This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to... more
Development of Multilingual Automatic Speech Recognition (ASR) systems enables to share existing speech and text corpora among languages. We have conducted experiments on the development of multilingual Acoustic Models (AM) and Language... more
This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to... more
This paper presents the application of morpheme-based and factored language models in an Amharic speech recognition task. Since using morphemes in both acoustic and language models results, mostly, in performance degradation due to... more
One of the current research areas is speech recognition by aiding in the recognition of speech signals through computer applications. In this research paper, Acoustic Nudging, (AN) Model is used in re-formulating the persistence automatic... more
Voice control is a prominent interaction method on personal computing devices. While automatic speech recognition (ASR) systems are readily applicable for large audiences, there is room for further adaptation at the edge, ie. locally on... more
Speech recognition has become increasingly popular in radiology reporting in the last decade. However, developing a speech recognition system for a new language in a highly specific domain requires a lot of resources, expert knowledge and... more
This paper gives an introduction of speech recognition systems for human-computer interaction using Vietnamese language. First, the paper investigates the two most common speech recognition toolkits currently used, HTK and Sphinx, and... more
In this paper it is presented a solution for replacing the current endoscopic exams control mechanisms. This kind of exams require the gastroenterologist to perform a complex procedure, using both hands simultaneously, to manipulate the... more
Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary... more
This paper gives an introduction of speech recognition systems for human-computer interaction using Vietnamese language. First, the paper investigates the two most common speech recognition toolkits currently used, HTK and Sphinx, and... more
This paper gives an introduction of speech recognition systems for human-computer interaction using Vietnamese language. First, the paper investigates the two most common speech recognition toolkits currently used, HTK and Sphinx, and... more
In this paper it is presented a solution for replacing the current endoscopic exams control mechanisms. This kind of exams require the gastroenterologist to perform a complex procedure, using both hands simultaneously, to manipulate the... more
This paper describes a system for performing alignment of subtitles to audio on multigenre broadcasts using a lightly supervised approach. Accurate alignment of subtitles plays a substantial role in the daily work of media companies and... more
In this paper, various methodologies of acoustic and language models, as well as labeling methods for automatic speech recognition for spoken dialogues in emergency call centers were investigated and comparatively analyzed. Because of the... more
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the... more
Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary... more
Deep Learning approaches have been widely known to perform better than statistical approaches. This is the first effort to investigate Recurrent Neural Network-based modeling for Punjabi speech corpus. We propose the Lattice Rescoring... more
The Arabic language has many features such as the phonology and the syntax that make it an easy language for developing automatic speech recognition systems. Many standard techniques for acoustic and language modeling such as context... more
— Speech recognition is widely researched topic around the world. It is a process of conversion of speech to text. Many scientists and researchers are busy with doing works to increase the performance of speech recognition systems. Most... more
This paper addresses the problem of open vocabulary automatic speech recognition (ASR) using hybrid statistical language models (LMs). Hybrid LMs differ from closed vocabulary LMs in that the word level lexicon is augmented with an... more
—Speech Recognition plays a very important role in day to day life. Speech Recognition is widely used and addicted by this world as it allows users to communicate with computers by recognizing their spoken language. Communication with... more
Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged from the... more
In this paper we present the creation of an Arabic version of Automated Speech Recognition System (ASR). This system is based on the open source Sphinx-4, from the Carnegie Mellon University. Which is a speech recognition system based on... more
The goal of this article is to provide and present information about the training procedure SpinxTrain and its eligible modifications to get accurate and robust speech recognition in a mobile GSM environment. Some modifications are based... more
Phonetic dictionaries are essential components of large-vocabulary natural language speakerindependent speech recognition systems. This paper presents a rule-based technique to generate Arabic phonetic dictionaries for a large vocabulary... more
Abstract The decoder of the sphinx-4 speech recognition system incorporates several new design strategies which have not been used earlier in conventional decoders of HMM-based large vocabulary speech recognition systems.
Abstract Sphinx-4 is a flexible, modular and pluggable framework to help foster new innovations in the core research of hidden Markov model (HMM) speech recognition systems. The design of Sphinx-4 is based on patterns that have emerged... more
Abstract Statistical language modeling remains a challenging task, in particular for morphologically rich languages. Recently, new approaches based on factored language models have been developed to address this problem. These models... more
Download research papers for free!