Arabic Speech Recognition: Advancement and Challenges
2024, IEEE access
https://doi.org/10.1109/ACCESS.2024.3376237Abstract
Speech recognition is a captivating process that revolutionizes human-computer interactions, allowing us to interact and control machines through spoken commands. The foundation of speech recognition lies in understanding a given language's linguistic and textual characteristics. Although automatic speech recognition (ASR) systems flawlessly convert speech into text for various international languages, their implementation for Arabic remains inadequate. In this research, we diligently explore the current state of Arabic ASR systems and unveil the challenges encountered during their development. We categorize these challenges into two groups: those specific to the Arabic language and those that are more general. Additionally, we propose strategies to overcome these obstacles and emphasize the need for ASR architectures tailored to the Arabic language's unique grammatical and phonetic structure. In addition, we provide a comprehensive and explicit description of various feature extraction methods, language models, and acoustic models utilized in the Arabic ASR system.
References (145)
- L. R. Rabiner, "A tutorial on hidden markov models and selected appli- cations in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
- I. Hamed, P. Denisov, C.-Y. Li, M. Elmahdy, S. Abdennadher, and N. T. Vu, "Investigations on speech recognition systems for low-resource dialectal arabic-english code-switching speech," Computer Speech & Language, vol. 72, p. 101278, 2022.
- H. Qasim and H. A. Abdulbaqi, "Arabic speech recognition using deep learning methods: Literature review," in AIP Conference Proceedings, vol. 2398, AIP Publishing, 2022.
- F. Al-Anzi and D. AbuZeina, "Literature survey of arabic speech recog- nition," in 2018 International conference on computing sciences and engineering (ICCSE), pp. 1-6, IEEE, 2018.
- W. Algihab, N. Alawwad, A. Aldawish, and S. AlHumoud, "Arabic speech recognition with deep learning: A review," in Social Comput- ing and Social Media. Design, Human Behavior and Analytics: 11th International Conference, SCSM 2019, Held as Part of the 21st HCI International Conference, HCII 2019, Orlando, FL, USA, July 26-31, 2019, Proceedings, Part I 21, pp. 15-31, Springer, 2019.
- A. Dhouib, A. Othman, O. El Ghoul, M. K. Khribi, and A. Al Sinani, "Arabic automatic speech recognition: a systematic literature review," Applied Sciences, vol. 12, no. 17, p. 8898, 2022.
- A. A. Abdelhamid, H. A. Alsayadi, I. Hegazy, and Z. T. Fayed, "End- to-end arabic speech recognition: A review," in Proceedings of the 19th Conference of Language Engineering (ESOLEC'19), Alexandria, Egypt, pp. 26-30, 2020.
- S. M. Abdou and A. M. Moussa, "Arabic speech recognition: Challenges and state of the art," Computational linguistics, speech and image pro- cessing for arabic language, pp. 1-27, 2019.
- H. A. Alsayadi, A. A. Abdelhamid, I. Hegazy, B. Alotaibi, and Z. T. Fayed, "Deep investigation of the recent advances in dialectal arabic speech recognition," IEEE Access, vol. 10, pp. 57063-57079, 2022.
- M. Elshafei, H. Al-Muhtaseb, and M. Al-Ghamdi, "Speaker-independent natural arabic speech recognition system," in The International Confer- ence on Intelligent Systems, 2008.
- A. Ali, Y. Zhang, P. Cardinal, N. Dahak, S. Vogel, and J. Glass, "A complete kaldi recipe for building arabic speech recognition systems," in 2014 IEEE spoken language technology workshop (SLT), pp. 525-529, IEEE, 2014.
- Y. A. Alotaibi and A. Hussain, "Comparative analysis of arabic vowels using formants and an automatic speech recognition system," Inter- national Journal of Signal Processing, Image Processing, and Pattern Recognition, vol. 3, no. 2, pp. 11-22, 2010.
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, et al., "The htk book," Cambridge University Engineering Department, vol. 3, no. 175, p. 12, 2002.
- S. Y. EL-Mashad, M. I. Sharway, and H. H. Zayed, "Speaker independent arabic speech recognition using support vector machine," 2017.
- T. Taleb, K. Samdanis, B. Mada, H. Flinck, S. Dutta, and D. Sabella, "On multi-access edge computing: A survey of the emerging 5g network edge cloud architecture and orchestration," IEEE Communications Surveys & Tutorials, vol. 19, no. 3, pp. 1657-1681, 2017.
- M. Elmahdy, R. Gruhn, W. Minker, and S. Abdennadher, "Modern standard arabic based multilingual approach for dialectal arabic speech recognition," in 2009 Eighth International Symposium on Natural Lan- guage Processing, pp. 169-174, IEEE, 2009.
- W. Aldjanabi, A. Dahou, M. A. Al-qaness, M. A. Elaziz, A. M. Helmi, and R. Damaševičius, "Arabic offensive and hate speech detection using a cross-corpora multi-task learning model," in Informatics, vol. 8, p. 69, MDPI, 2021.
- H. Hyassat and R. Abu Zitar, "Arabic speech recognition using sphinx engine," International Journal of Speech Technology, vol. 9, pp. 133-150, 2006.
- M. A. Abushariah, R. N. Ainon, R. Zainuddin, M. Elshafei, and O. O. Khalifa, "Natural speaker-independent arabic speech recognition system based on hidden markov models using sphinx tools," in International Conference on Computer and Communication Engineering (ICCCE'10), pp. 1-6, IEEE, 2010.
- B. A. Al-Qatab and R. N. Ainon, "Arabic speech recognition using hidden markov model toolkit (htk)," in 2010 international symposium on information technology, vol. 2, pp. 557-562, IEEE, 2010.
- H. Bahi and M. Sellami, "Combination of vector quantization and hidden markov models for arabic speech recognition," in Proceedings ACS/IEEE International Conference on Computer Systems and Appli- cations, pp. 96-100, IEEE, 2001.
- N. Hammami, M. Bedda, and N. Farah, "Spoken arabic digits recognition using mfcc based on gmm," in 2012 IEEE Conference on Sustainable Uti- lization and Development in Engineering and Technology (STUDENT), pp. 160-163, IEEE, 2012.
- A. Ouisaadane and S. Safi, "A comparative study for arabic speech recog- nition system in noisy environments," International Journal of Speech Technology, vol. 24, no. 3, pp. 761-770, 2021.
- H. A. Alsayadi, A. A. Abdelhamid, I. Hegazy, and Z. T. Fayed, "Arabic speech recognition using end-to-end deep learning," IET Signal Process- ing, vol. 15, no. 8, pp. 521-534, 2021.
- P. Ongsulee, "Artificial intelligence, machine learning and deep learning," in 2017 15th international conference on ICT and knowledge engineering (ICT&KE), pp. 1-6, IEEE, 2017.
- Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015.
- X. Wang, Y. Zhao, and F. Pourpanah, "Recent advances in deep learning," International Journal of Machine Learning and Cybernetics, vol. 11, pp. 747-750, 2020.
- A. Mehrish, N. Majumder, R. Bharadwaj, R. Mihalcea, and S. Poria, "A review of deep learning techniques for speech processing," Information Fusion, p. 101869, 2023.
- A. S. M. B. Wazir and J. H. Chuah, "Spoken arabic digits recognition us- ing deep learning," in 2019 IEEE International Conference on Automatic Control and Intelligent Systems (I2CACIS), pp. 339-344, IEEE, 2019.
- A. AbdAlmisreb, A. F. Abidin, and N. M. Tahir, "Maxout based deep neural networks for arabic phonemes recognition," in 2015 IEEE 11th In- ternational Colloquium on Signal Processing & its Applications (CSPA), pp. 192-197, IEEE, 2015.
- A. Emami and L. Mangu, "Empirical study of neural network language models for arabic speech recognition," in 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU), pp. 147-152, IEEE, 2007.
- N. Zerari, S. Abdelhamid, H. Bouzgou, and C. Raymond, "Bidirectional deep architecture for arabic speech recognition," Open Computer Sci- ence, vol. 9, no. 1, pp. 92-102, 2019.
- B. Zada and R. Ullah, "Pashto isolated digits recognition using deep convolutional neural network," Heliyon, vol. 6, no. 2, 2020.
- M. Abadi, "Tensorflow: learning functions at scale," in Proceedings of the 21st ACM SIGPLAN international conference on functional program- ming, pp. 1-1, 2016.
- M. Alghamdi, M. Elshafei, and H. Al-Muhtaseb, "Arabic broadcast news transcription system," International Journal of Speech Technology, vol. 10, pp. 183-195, 2007.
- M. El Choubassi, H. El Khoury, C. J. Alagha, J. Skaf, and M. Al-Alaoui, "Arabic speech recognition using recurrent neural networks," in Proceed- ings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (IEEE Cat. No. 03EX795), pp. 543-547, IEEE, 2003.
- A. Messaoudi, H. Haddad, C. Fourati, M. B. Hmida, A. B. E. Mabrouk, and M. Graiet, "Tunisian dialectal end-to-end speech recognition based on deepspeech," Procedia Computer Science, vol. 189, pp. 183-190, 2021.
- Z. J. Mohammed Ameen, A. Abdulrahman Kadhim, et al., "Deep learning methods for arabic autoencoder speech recognition system for electro-larynx device," Advances in Human-Computer Interaction, vol. 2023, 2023.
- O. Mahmoudi and M. F. Bouami, "Arabic speech commands recognition with lstm & gru models using cuda toolkit implementation," in 2023 3rd International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), pp. 1-4, 2023.
- S. Nasr, R. Duwairi, and M. Quwaider, "End-to-end speech recognition for arabic dialects," Arabian Journal for Science and Engineering, pp. 1- 17, 2023.
- D. Yu and L. Deng, Automatic speech recognition, vol. 1. Springer, 2016.
- N. S. Bostrom, "Paths, dangers, strategies. transl. from eng," 2016.
- A. Hussein, S. Watanabe, and A. Ali, "Arabic speech recognition by end- to-end, modular systems and human," Computer Speech & Language, vol. 71, p. 101272, 2022.
- Z. J. M. Ameen and A. A. Kadhim, "Machine learning for arabic phonemes recognition using electrolarynx speech," International Journal of Electrical and Computer Engineering, vol. 13, no. 1, p. 400, 2023.
- D. R. Reddy, "Speech recognition by machine: A review," Proceedings of the IEEE, vol. 64, no. 4, pp. 501-531, 1976.
- L. Rabiner and B. Juang, "An introduction to hidden markov models," ieee assp magazine, vol. 3, no. 1, pp. 4-16, 1986.
- J. M. Tebelskis, Speech recognition using neural networks. Carnegie Mellon University, 1995.
- M. Gales, S. Young, et al., "The application of hidden markov models in speech recognition," Foundations and Trends® in Signal Processing, vol. 1, no. 3, pp. 195-304, 2008.
- C. Rashmi, "Review of algorithms and applications in speech recognition system," International Journal of Computer Science and Information Technologies, vol. 5, no. 4, pp. 5258-5262, 2014.
- M. A. Haque, A. Verma, J. S. R. Alex, and N. Venkatesan, "Experi- mental evaluation of cnn architecture for speech recognition," in First International Conference on Sustainable Technologies for Computational Intelligence: Proceedings of ICTSCI 2019, pp. 507-514, Springer, 2020.
- T. Zoughi, M. M. Homayounpour, and M. Deypir, "Adaptive windows multiple deep residual networks for speech recognition," Expert Systems with Applications, vol. 139, p. 112840, 2020.
- T. Takiguchi and Y. Ariki, "Pca-based speech enhancement for distorted speech recognition.," Journal of multimedia, vol. 2, no. 5, 2007.
- O.-W. Kwon and T.-W. Lee, "Phoneme recognition using ica-based feature extraction and transformation," Signal Processing, vol. 84, no. 6, pp. 1005-1019, 2004.
- M. Ziółko, R. Samborski, J. Gałka, and B. Ziółko, "Wavelet-fourier anal- ysis for speaker recognition," in 17th national conference on applications of mathematics in biology and medicine, vol. 134, p. 129, 2011.
- R. Haeb-Umbach and H. Ney, "Linear discriminant analysis for improved large vocabulary continuous speech recognition.," in icassp, vol. 92, pp. 13-16, Citeseer, 1992.
- N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front- end factor analysis for speaker verification," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 4, pp. 788-798, 2010.
- E. Variani, X. Lei, E. McDermott, I. L. Moreno, and J. Gonzalez- Dominguez, "Deep neural networks for small footprint text-dependent speaker verification," in 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4052-4056, IEEE, 2014.
- F. Zheng, G. Zhang, and Z. Song, "Comparison of different implemen- tations of mfcc," Journal of Computer science and Technology, vol. 16, pp. 582-589, 2001.
- C. Ittichaichareon, S. Suksri, and T. Yingthawornsuk, "Speech recog- nition using mfcc," in International conference on computer graphics, simulation and modeling, vol. 9, 2012.
- M. Westphal, "The use of cepstral means in conversational speech recog- nition.," in EUROSPEECH, 1997.
- H. Hermansky and N. Morgan, "Rasta processing of speech," IEEE transactions on speech and audio processing, vol. 2, no. 4, pp. 578-589, 1994.
- H. Hermansky and P. Fousek, "Multi-resolution rasta filtering for tandem- based asr," tech. rep., IDIAP, 2005.
- K. Molla and K. Hirose, "On the effectiveness of mfccs and their statistical distribution properties in speaker identification," in 2004 IEEE Symposium on Virtual Environments, Human-Computer Interfaces and Measurement Systems, 2004.(VCIMS)., pp. 136-141, IEEE, 2004.
- N. Dave, "Feature extraction methods lpc, plp and mfcc in speech recognition," International journal for advance research in engineering and technology, vol. 1, no. 6, pp. 1-4, 2013.
- S. K. Gaikwad, B. W. Gawali, and P. Yannawar, "A review on speech recognition technique," International Journal of Computer Applications, vol. 10, no. 3, pp. 16-24, 2010.
- J. J. Bird, E. Wanner, A. Ekárt, and D. R. Faria, "Phoneme aware speech recognition through evolutionary optimisation," in Proceedings of the genetic and evolutionary computation conference companion, pp. 362- 363, 2019.
- K. Audhkhasi, B. Kingsbury, B. Ramabhadran, G. Saon, and M. Picheny, "Building competitive direct acoustics-to-word models for english con- versational speech recognition," in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 4759-4763, IEEE, 2018.
- A. Graves, N. Jaitly, and A.-r. Mohamed, "Hybrid speech recognition with deep bidirectional lstm," in 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273-278, IEEE, 2013.
- H. A. Bourlard and N. Morgan, Connectionist speech recognition: a hybrid approach, vol. 247. Springer Science & Business Media, 1994.
- A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in International conference on machine learn- ing, pp. 1764-1772, PMLR, 2014.
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, et al., "Deep speech: Scaling up end-to-end speech recognition," arXiv preprint arXiv:1412.5567, 2014.
- J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, "Attention-based models for speech recognition," Advances in neural information processing systems, vol. 28, 2015.
- L. Dong, S. Xu, and B. Xu, "Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition," in 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5884-5888, IEEE, 2018.
- J. Schmidhuber, S. Hochreiter, et al., "Long short-term memory," Neural Comput, vol. 9, no. 8, pp. 1735-1780, 1997.
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, "Empirical evaluation of gated recurrent neural networks on sequence modeling," arXiv preprint arXiv:1412.3555, 2014.
- K. Irie, Z. Tüske, T. Alkhouli, R. Schlüter, H. Ney, et al., "Lstm, gru, highway and a bit of attention: An empirical overview for language modeling in speech recognition," in Interspeech, pp. 3519-3523, 2016.
- M. Alghmadi, "Kacst arabic phonetic database," in the Fifteenth Interna- tional Congress of Phonetics Science, Barcelona, pp. 3109-3112, 2003.
- F. Elmisery, A. Khalil, A. Salama, and H. Hammed, "A fpga-based hmm for a discrete arabic speech recognition system," in Proceedings of the 12th IEEE International Conference on Fuzzy Systems (Cat. No. 03CH37442), pp. 322-325, IEEE, 2003.
- A. Amrouche and J. M. Rouvaen, "Arabic isolated word recognition using general regression neural network," in 2003 46th Midwest Symposium on Circuits and Systems, vol. 2, pp. 689-692, IEEE, 2003.
- H. Bourouba, R. Djemili, M. Bedda, and C. Snani, "New hybrid system (supervised classifier/hmm) for isolated arabic speech recognition," in 2006 2nd International Conference On Information & Communication Technologies, vol. 1, pp. 1264-1269, IEEE, 2006.
- E. Essa, A. Tolba, and S. Elmougy, "A comparison of combined classifier architectures for arabic speech recognition," in 2008 International Con- ference on Computer Engineering & Systems, pp. 149-153, IEEE, 2008.
- H. Satori, H. Hiyassat, M. Haiti, and N. Chenfour, "Investigation arabic speech recognition using cmu sphinx system.," International Arab Journal of Information Technology (IAJIT), vol. 6, no. 2, 2009.
- M. Azmi, H. Tolba, S. Mahdy, and M. Fashal, "Syllable-based automatic arabic speech recognition in noisy-telephone channel," WSEAS Transac- tions on Signal Processing, vol. 4, no. 4, pp. 211-220, 2008.
- R. Kolobov, O. Okhapkina, A. P. Olga Omelchishina, R. Bedyakin, V. Moshkin, D. Menshikov, and N. Mikhaylovskiy, "Mediaspeech: Mul- tilanguage asr benchmark and dataset," 2021.
- B. UBUNTU, "Quran ayat speech to text," 2022.
- OpenSLR, "Tunisian modern standard arabic," 2017.
- A. Q. Ohi, M. F. Mridha, M. A. Hamid, and M. M. Monowar, "Deep speaker recognition: Process, progress, and challenges," IEEE Access, vol. 9, pp. 89619-89643, 2021.
- V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, "Librispeech: an asr corpus based on public domain audio books," in 2015 IEEE interna- tional conference on acoustics, speech and signal processing (ICASSP), pp. 5206-5210, IEEE, 2015.
- A. P. Kaur, A. Singh, R. Sachdeva, and V. Kukreja, "Automatic speech recognition systems: A survey of discriminative techniques," Multimedia Tools and Applications, vol. 82, no. 9, pp. 13307-13339, 2023.
- A. Varga and H. J. Steeneken, "Assessment for automatic speech recog- nition: Ii. noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech communication, vol. 12, no. 3, pp. 247-251, 1993.
- V. Z. Këpuska, H. A. Elharati, et al., "Robust speech recognition system using conventional and hybrid features of mfcc, lpcc, plp, rasta-plp and hidden markov model classifier in noisy conditions," Journal of Computer and Communications, vol. 3, no. 06, p. 1, 2015.
- B. Logan et al., "Mel frequency cepstral coefficients for music model- ing.," in Ismir, vol. 270, p. 11, Plymouth, MA, 2000.
- M. R. Hasan, M. Jamil, M. Rahman, et al., "Speaker identification using mel frequency cepstral coefficients," variations, vol. 1, no. 4, pp. 565- 568, 2004.
- J. Martinez, H. Perez, E. Escamilla, and M. M. Suzuki, "Speaker recognition using mel frequency cepstral coefficients (mfcc) and vector quantization (vq) techniques," in Conielecomp 2012, 22nd International conference on electrical communications and computers, pp. 248-251, IEEE, 2012.
- M. O. Khelifa, Y. M. Elhadj, Y. Abdellah, and M. Belkasmi, "Con- structing accurate and robust hmm/gmm models for an arabic speech recognition system," International Journal of Speech Technology, vol. 20, pp. 937-949, 2017.
- H. Hermansky, "Perceptual linear predictive (plp) analysis of speech," the Journal of the Acoustical Society of America, vol. 87, no. 4, pp. 1738- 1752, 1990.
- F. Hönig, G. Stemmer, C. Hacker, and F. Brugnara, "Revising perceptual linear prediction (plp)," in Ninth European Conference on Speech Com- munication and Technology, 2005.
- H. B. Sailor and H. A. Patil, "Filterbank learning using convolutional restricted boltzmann machine for speech recognition," in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5895-5899, IEEE, 2016.
- G. K. Liu, "Evaluating gammatone frequency cepstral coefficients with neural networks for emotion recognition from speech," arXiv preprint arXiv:1806.09010, 2018.
- P. F. Brown, V. J. Della Pietra, P. V. Desouza, J. C. Lai, and R. L. Mer- cer, "Class-based n-gram models of natural language," Computational linguistics, vol. 18, no. 4, pp. 467-480, 1992.
- A. Pauls and D. Klein, "Faster and smaller n-gram language models," in Proceedings of the 49th annual meeting of the Association for Computa- tional Linguistics: Human Language Technologies, pp. 258-267, 2011.
- Y. Bengio, "Neural net language models," Scholarpedia, vol. 3, no. 1, p. 3881, 2008.
- K. Jing and J. Xu, "A survey on neural network language models," arXiv preprint arXiv:1906.03591, 2019.
- O. Zheng, M. Abdel-Aty, D. Wang, Z. Wang, and S. Ding, "Chatgpt is on the horizon: Could a large language model be all we need for intelligent transportation?," arXiv preprint arXiv:2303.05382, 2023.
- P. Shaw, J. Uszkoreit, and A. Vaswani, "Self-attention with relative position representations," arXiv preprint arXiv:1803.02155, 2018.
- M. India, P. Safari, and J. Hernando, "Self multi-head attention for speaker recognition," arXiv preprint arXiv:1906.09890, 2019.
- S. R. Eddy, "Hidden markov models," Current opinion in structural biology, vol. 6, no. 3, pp. 361-365, 1996.
- S. R. Eddy, "What is a hidden markov model?," Nature biotechnology, vol. 22, no. 10, pp. 1315-1316, 2004.
- J. Picone, "Continuous speech recognition using hidden markov models," IEEE Assp magazine, vol. 7, no. 3, pp. 26-41, 1990.
- M. Afify, O. Siohan, and R. Sarikaya, "Gaussian mixture language models for speech recognition," in 2007 IEEE International Conference on Acoustics, Speech and Signal Processing-ICASSP'07, vol. 4, pp. IV- 29, IEEE, 2007.
- D. Povey, L. Burget, M. Agarwal, P. Akyazi, K. Feng, A. Ghoshal, O. Glembek, N. K. Goel, M. Karafiát, A. Rastrow, et al., "Subspace gaus- sian mixture models for speech recognition," in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4330-4333, IEEE, 2010.
- D. A. Reynolds et al., "Gaussian mixture models.," Encyclopedia of biometrics, vol. 741, no. 659-663, 2009.
- D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glem- bek, N. Goel, M. Karafiát, A. Rastrow, et al., "The subspace gaussian mixture model-a structured model for speech recognition," Computer Speech & Language, vol. 25, no. 2, pp. 404-439, 2011.
- P.-S. Huang and M. Hasegawa-Johnson, "Cross-dialectal data transfer- ring for gaussian mixture model training in arabic speech recognition," constraints, vol. 1, p. 1, 2012.
- S. Bhatia, A. Devi, R. I. Alsuwailem, and A. Mashat, "Convolutional neural network based real time arabic speech recognition to arabic braille for hearing and visually impaired," Frontiers in Public Health, vol. 10, p. 898355, 2022.
- A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, "Speech recognition using deep neural networks: A systematic review," IEEE access, vol. 7, pp. 19143-19165, 2019.
- E. R. Abdelmaksoud, A. Hassen, N. Hassan, and M. Hesham, "Con- volutional neural network for arabic speech recognition," The Egyptian Journal of Language Engineering, vol. 8, no. 1, pp. 27-38, 2021.
- R. Amari, Z. Noubigh, S. Zrigui, D. Berchech, H. Nicolas, and M. Zrigui, "Deep convolutional neural network for arabic speech recognition," in International Conference on Computational Collective Intelligence, pp. 120-134, Springer, 2022.
- S. Albawi, T. A. Mohammed, and S. Al-Zawi, "Understanding of a convolutional neural network," in 2017 international conference on en- gineering and technology (ICET), pp. 1-6, Ieee, 2017.
- T. Donkers, B. Loepp, and J. Ziegler, "Sequential user-based recurrent neural network recommendations," in Proceedings of the eleventh ACM conference on recommender systems, pp. 152-160, 2017.
- A. M. Ahmad, S. Ismail, and D. Samaon, "Recurrent neural network with backpropagation through time for speech recognition," in IEEE Inter- national Symposium on Communications and Information Technology, 2004. ISCIT 2004., vol. 1, pp. 98-102, IEEE, 2004.
- A. Shewalkar, D. Nyavanandi, and S. A. Ludwig, "Performance evalu- ation of deep neural networks applied to speech recognition: Rnn, lstm and gru," Journal of Artificial Intelligence and Soft Computing Research, vol. 9, no. 4, pp. 235-245, 2019.
- A. Ahmed, Y. Hifny, K. Shaalan, and S. Toral, "End-to-end lexicon free arabic speech recognition using recurrent neural networks," in Computa- tional Linguistics, Speech And Image Processing For Arabic Language, pp. 231-248, World Scientific, 2019.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- R. Dey and F. M. Salem, "Gate-variants of gated recurrent unit (gru) neural networks," in 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp. 1597-1600, IEEE, 2017.
- M. Sameer, A. Talib, A. Hussein, and H. Husni, "Arabic speech recog- nition based on encoder-decoder architecture of transformer," Journal of Techniques, vol. 5, no. 1, pp. 176-183, 2023.
- S. Latif, A. Zaidi, H. Cuayahuitl, F. Shamshad, M. Shoukat, and J. Qadir, "Transformers in speech processing: A survey," arXiv preprint arXiv:2303.11607, 2023.
- M. Hadwan, H. A. Alsayadi, and S. AL-Hagree, "An end-to-end transformer-based automatic speech recognition for qur'an reciters.," Computers, Materials & Continua, vol. 74, no. 2, 2023.
- M. H. Sazli, "A brief review of feed-forward neural networks," Communi- cations Faculty of Sciences University of Ankara Series A2-A3 Physical Sciences and Engineering, vol. 50, no. 01, 2006.
- N. Seshadri and C.-E. Sundberg, "List viterbi decoding algorithms with applications," IEEE transactions on communications, vol. 42, no. 234, pp. 313-323, 1994.
- A. Boyer, J. Di Martino, P. Divoux, J.-P. Haton, J.-F. Mari, and K. Smaïli, "Statistical methods in multi-speaker automatic speech recognition," Applied Stochastic Models and Data Analysis, vol. 6, no. 3, pp. 143-155, 1990.
- A. A. Harere and K. A. Jallad, "Quran recitation recognition using end- to-end deep learning," arXiv preprint arXiv:2305.07034, 2023.
- M. Mohri, F. Pereira, and M. Riley, "Weighted finite-state transducers in speech recognition," Computer Speech & Language, vol. 16, no. 1, pp. 69-88, 2002.
- M. Lehr and I. Shafran, "Learning a discriminative weighted finite-state transducer for speech recognition," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 5, pp. 1360-1367, 2010.
- B. Dorr, M. Snover, and N. Madnani, "Part 5: Machine translation evaluation," Handb. Nat. Lang. Process. Mach. Transl. DARPA Glob. Auton. Lang. Exploit, vol. 801, 2011.
- A. Ahmed, Y. Hifny, K. Shaalan, and S. Toral, "Lexicon free arabic speech recognition recipe," in Proceedings of the International Confer- ence on Advanced Intelligent Systems and Informatics 2016 2, pp. 147- 159, Springer, 2017.
- A. Ali and S. Renals, "Word error rate estimation for speech recognition: e-wer," in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 20-24, 2018.
- I. Bazzi, R. Schwartz, and J. Makhoul, "An omnifont open-vocabulary ocr system for english and arabic," IEEE Transactions on pattern analysis and machine intelligence, vol. 21, no. 6, pp. 495-504, 1999.
- A. Stolcke, Y. Konig, and M. Weintraub, "Explicit word error minimiza- tion in n-best list rescoring," in Fifth European Conference on Speech Communication and Technology, 1997.
- A. Y. Vadwala, K. A. Suthar, Y. A. Karmakar, N. Pandya, and B. Patel, "Survey paper on different speech recognition algorithm: challenges and techniques," Int J Comput Appl, vol. 175, no. 1, pp. 31-36, 2017.
- P. Sahu, M. Dua, and A. Kumar, "Challenges and issues in adopt- ing speech recognition," Speech and Language Processing for Human- Machine Communications: Proceedings of CSI 2015, pp. 209-215, 2018.
- S. Nivetha, "A survey on speech feature extraction and classification techniques," in 2020 international conference on inventive computation technologies (ICICT), pp. 48-53, IEEE, 2020.
- A. Singh, V. Kadyan, M. Kumar, and N. Bassan, "Asroil: a comprehensive survey for automatic speech recognition of indian languages," Artificial Intelligence Review, vol. 53, pp. 3673-3704, 2020.
- A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, "wav2vec 2.0: A frame- work for self-supervised learning of speech representations," Advances in neural information processing systems, vol. 33, pp. 12449-12460, 2020.
- M. Ravanelli, P. Brakel, M. Omologo, and Y. Bengio, "Light gated recurrent units for speech recognition," IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 2, pp. 92-102, 2018.