Multi-Dialect Arabic Speech Recognition
2020 International Joint Conference on Neural Networks (IJCNN)
https://doi.org/10.1109/IJCNN48605.2020.9206658Abstract
This paper presents the design and development of multi-dialect automatic speech recognition for Arabic. Deep neural networks are becoming an effective tool to solve sequential data problems, particularly, adopting an end-to-end training of the system. Arabic speech recognition is a complex task because of the existence of multiple dialects, non-availability of large corpora, and missing vocalization. Thus, the first contribution of this work is the development of a large multi-dialectal corpus with either full or at least partially vocalized transcription. Additionally, the open-source corpus has been gathered from multiple sources that bring non-standard Arabic alphabets in transcription which are normalized by defining a common character-set. The second contribution is the development of a framework to train an acoustic model achieving state-of-the-art performance. The network architecture comprises of a combination of convolutional and recurrent layers. The spectrogram features of the audio data are extracted in the frequency vs time domain and fed in the network. The output frames, produced by the recurrent model, are further trained to align the audio features with its corresponding transcription sequences. The sequence alignment is performed using a beam search decoder with a tetra-gram language model. The proposed system achieved a 14% error rate which outperforms previous systems.
References (28)
- M. Boudraa, B. Boudraa, and B. Guerin, "Twenty Lists of Ten Arabic Sentences for Assessment," in ACUSTICA, ACTA-ACUSTICA, vol. 86, no. 5, 1998.
- S. Selouani, and J. Caelen, "Arabic phonetic features recognition us- ing modular connectionist architectures," in Proceedings of the IEEE Interactive Voice Technology for Communication, IVTTA, 1998.
- B. Al-Diri, A. Sharieh, and T. Hudaib, "A time delay neural network architecture for efficient modeling of long temporal contexts," in Inter- speech, 2015.
- V. Peddinti, D. Povey, and S. Khudanpur, "A time delay neural net- work architecture for efficient modeling of long temporal contexts," in Interspeech, 2015.
- M. A. Mansour, "KACST arabic phonetics database," Riyadh, Kingdom of Saudi Arabia, 2004.
- A. Sankaranarayanan, S. Bangalore, and S. S. Narayanan, "Automatic diacritization of Arabic transcripts for automatic speech recognition," in Proceedings of the International Conference on Natural Language Processing (ICON), 2005.
- M. Alghamdi, F. Alhargan, M. Alkanhal, A. Alkhairy, M. Eldesouki, and A. Alenazi, "Saudi Accented Arabic Voice Bank," in Journal of King Saud University, Computer and Information Sciences, vol. 20, pages 45-64, 2008.
- X. Glorot, and Y. Bengio, "Understanding the difficulty of training deep feed-forward neural networks," in Proceedings of the 20 th International Conference on Artificial Intelligence and Statistics (PMLR), 2010.
- A. R. Ali, and S. Hussain, "Automatic Diacritization for Urdu," in Proceedings of the Conference on Language and Technology (CLT), 2010.
- M. A. M. Abushariah, R. N. Ainon, R. Zainuddin, M. Elshafei, and O. O. Khalifa, "Phonetically rich and balanced text and speech corpora for Arabic language," in Journal Language Resources and Evaluation, vol. 46, no. 4, pages 601-634, 2012.
- K. Heafield, I. Pouzyrevsky, J. H. Clark, and P. Koehn, "Scalable modified Kneser-Ney language model estimation," in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, 2013.
- A. Graves, A. Mohamed, and G. Hinton, "Speech Recognition with Deep Recurrent Neural Networks," in Computing Research Repository (CoRR), vol. abs/1303.5778, 2013.
- M. Alsulaiman, Z. Ali, G. Muhammed, M. Bencherif, and A. Mahmood, "KSU Speech Database: Text Selection, Recording and Verification," in European Modelling Symposium, 2013.
- A. Hannun, C. Case, J. Casper, and B. Catanzaro, "Deep Speech: Scaling up end-to-end speech recognition," in Computing Research Repository (CoRR), vol. abs/1412.5567, 2014.
- A. Graves, and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proceedings of the 31st International Conference on International Conference on Machine Learning (ICML), 2014.
- A. Alalshekmubarak, and L. S. Smith, "On Improving the Classification Capability of Reservoir Computing for Arabic Speech Recognition," in 24 th International Conference on Artificial Neural Networks, 2014.
- D. P. Kingma, and J. B. Adam, "A method for stochastic optimization," International Conference on Learning Representations (ICLR), 2015.
- D. Amodei, S. Ananthanarayanan, R. Anubhai, and J. Bai, "Deep Speech 2: End-to-End Speech Recognition in English and Mandarin," in Proceedings of Machine Learning Research (PMLR), vol. 48, 2015.
- S. Wray, and A. Ali, "Crowdsource a little to label a lot: Labeling a Speech Corpus of Dialectal Arabic," in Interspeech, 2015.
- S. Khurana, and A. Ali, "QCRI advanced transcription system (QATS) for the Arabic Multi-Dialect Broadcast media recognition: MGB-2 challenge," in IEEE Spoken Language Technology Workshop (SLT), 2016.
- A. Ahmed, Y. Hifny, K. Shaalan, and S. L. Toral, "Lexicon Free Arabic Speech Recognition Recipe," in International Conference on Advanced Intelligent Systems and Informatics, vol. 12, pages 147-159, 2016.
- T. AlHanai, W. N. Hsu, and J. Glass, "Development of the MIT ASR system for the 2016 Arabic Multi-genre Broadcast Challenge," in IEEE Spoken Language Technology Workshop (SLT), 2016.
- T. Zerrouki, and A. Balla, "Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems," in Data in Brief, vol. 11, 2017.
- M. Najafian, W. N. Hsu, A. Ali, and J. Glass, "Automatic speech recognition of Arabic multi-genre broadcast media," in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2017.
- A. Ali, P. Bell, J. Glass, Y. Messaoui, H. Mubarak, S. Renals, and Y. Zhang, "The MGB-2 Challenge: Arabic Multi-Dialect Broadcast Media Recognition," in Computing Research Repository (CoRR), vol. abs/1609.05625, 2017.
- M. A. Menacer, O. Mella, and D. Fohr, "An enhanced automatic speech recognition system for Arabic," in the 3 rd Arabic Natural Language Processing Workshop (EACL), 2017.
- L. Bouchakour, and M. Debyeche, "Improving Continuous Arabic Speech Recognition over Mobile Networks DSR and NSR Using MFCCs Features Transformed," in International Journal of Circuits, Systems and Signal Processing, vol. 12, 2018.
- A. R. Ali, "Cognitive Computing to Optimize IT Services," in IEEE 17 th International Conference on Cognitive Informatics and Cognitive Computing (ICCI*CC), Berkeley, CA, 2018, vol. 1, pages 54-60.