Ahmed Ali

Followers

Following

Co-authors

Public Views

Interests

Uploads

Papers by Ahmed Ali

Automatic Speech Recognition Of Arabic Multi-genre Broadcast Media

by Maryam Najafian, Wei-Ning Hsu, and Ahmed Ali

ASRU, 2017

This paper describes an Arabic Automatic Speech Recognition system developed on 15 hours of Multi... more This paper describes an Arabic Automatic Speech Recognition system developed on 15 hours of Multi-Genre Broadcast (MGB-3) data from YouTube, plus 1,200 hours of Multi-Dialect and Multi-Genre MGB-2 data recorded from the Aljazeera Arabic TV channel. In this paper, we report our investigations of a range of signal pre-processing, data augmentation , topic-specific language model adaptation, accent specific retraining , and deep learning based acoustic modeling topologies, such as feed-forward Deep Neural Networks (DNNs), Time-delay Neural Networks (TDNNs), Long Short-term Memory (LSTM) networks, Bidirectional LSTMs (BLSTMs), and a Bidirectional version of the Prioritized Grid LSTM (BPGLSTM) model. We propose a system combination for three purely sequence trained recognition systems based on lattice-free maximum mutual information, 4-gram language model re-scoring, and system combination using the minimum Bayes risk decoding criterion. The best word error rate we obtained on the MGB-3 Arabic development set using a 4-gram re-scoring strategy is 42.25% for a chain BLSTM system, compared to 65.44% baseline for a DNN system.

Download

QMDIS: QCRI-MIT Advanced Dialect Identification System

by Maryam Najafian and Ahmed Ali

Interspeech, 2017

As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (D... more As a continuation of our efforts towards tackling the problem of spoken Dialect Identification (DID) for Arabic languages, we present the QCRI-MIT Advanced Dialect Identification System (QMDIS). QMDIS is an automatic spoken DID system for Di-alectal Arabic (DA). In this paper, we report a comprehensive study of the three main components used in the spoken DID task: phonotactic, lexical and acoustic. We use Support Vector Machines (SVMs), Logistic Regression (LR) and Convolutional Neural Networks (CNNs) as backend classifiers throughout the study. We perform all our experiments on a publicly available dataset and present new state-of-the-art results. QMDIS discriminates between the five most widely used dialects of Arabic: namely Egyptian, Gulf, Levantine, North African, and Modern Standard Arabic (MSA). We report ≈ 73% accuracy for system combination. All the data and the code used in our experiments are publicly available for research.

Download

Non-negative Factor Analysis of Gaussian Mixture Model Weight Adaptation for Language and Dialect Recognition

by Jim Glass, Mohamad Hasan Bahari, and Ahmed Ali

IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, Jul 2014

Recent studies show that Gaussian mixture model (GMM) weights carry less, yet complementary, info... more Recent studies show that Gaussian mixture model (GMM) weights carry less, yet complementary, information to GMM means
for language and dialect recognition. However, state-of-the-art language recognition
systems usually do not use this information. In this research, a non-negative factor analysis (NFA) approach
is developed for GMM weight decomposition and adaptation. This modeling, which is conceptually simple and
computationally inexpensive, suggests a new low-dimensional utterance representation method using a factor
analysis similar to that of the i-vector framework.
The obtained subspace vectors are then applied in conjunction with i-vectors to the language/dialect
recognition problem.
The suggested approach is evaluated on the NIST
2011 and RATS language recognition evaluation (LRE) corpora and
on the QCRI Arabic dialect recognition evaluation (DRE) corpus.
The assessment results show that the proposed adaptation method yields more accurate recognition results
compared to three conventional weight adaptation approaches, namely maximum likelihood re-estimation, non-negative matrix
factorization, and a subspace multinomial model. Experimental results also show that the
intermediate-level fusion of i-vectors and NFA subspace vectors improves the performance
of the state-of-the-art i-vector framework especially for the case of short utterances.

Download

Ahmed Ali

Uploads

Papers by Ahmed Ali

Log In