Emotion Recognition from Spontaneous Tunisian Dialect Speech
2025, Emotion Recognition from Spontaneous Tunisian Dialect Speech
https://doi.org/10.1145/3708340Abstract
Emotional expressions are a fundamental aspect of human communication, with speech being one of the most natural modes of interaction. Speech Emotion Recognition (SER) is a significant research topic in Natural Language Processing (NLP), aimed at identifying emotions such as satisfaction, frustration, and anger from speech audio using multiple classifiers. This paper presents a method to emotion recognition from spontaneous Tunisian Dialect (TD) speech, marking the first work in the SER field to utilize spontaneous speech for emotion recognition in this dialect. The dataset was created from freely available YouTube videos across multiple domains and labeled with four perceived emotions: anger, satisfaction, frustration, and neutral. To address the data scarcity issue, we implemented data augmentation techniques, specifically Vocal Tract Length Perturbation (VTLP). The preprocessing of the speech signals involved cleaning the data from ambient and unwanted noises. We extracted and selected various spectral features, including mel-frequency cepstral coefficients (MFCC) and Linear Prediction Cepstral Coefficients (LPCC). Subsequently, we applied several classification methods: Support Vector Machine (SVM), Bidirectional Long Short-Term Memory (BiLSTM), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), and Random Forest. Our experiments demonstrated that the Random Forest classifier achieved the highest F-score of 58.75%. The results were thoroughly discussed, analyzed, and compared across the five models using different feature extractions. This study provides valuable insights and advancements in the SER field, particularly for the TD, future research directions for improving emotion recognition systems.
References (41)
- Prasetya, M. R., Harjoko, A., Supriyanto, C. (2019, December). Speech emotion recognition of Indonesian movie audio tracks based on MFCC and SVM. In 2019 international conference on contemporary computing and informatics (IC3I) (pp. 22-25). IEEE.
- Likitha, M. S., Gupta, S. R. R., Hasitha, K., Raju, A. U. (2017, March). Speech based human emotion recognition using MFCC. In 2017 inter national conference on wireless communications, signal processing and networking (WiSPNET) (pp. 2257-2260). IEEE.
- El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern recognition, 44(3), 572-587.
- Mekki, A., Zribi, I., Ellouze, M., \& Belguith, L. H. (2023). Tokenization of Tunisian Arabic: a comparison between three Machine Learning models. ACM Transactions on Asian and Low-Resource Language Information Processing, 22(7), 1-19.
- AL-Sarayreh, S., Mohamed, A., Shaalan, K. (2023). Challenges and solutions for Arabic natural language processing in social media. In International conference on Variability of the Sun and sun-like stars: from asteroseismology to space weather (pp. 293-302). Springer, Singapore.
- Fishman, J. A. (2020). Bilingualism with and without diglossia; diglossia with and without bilingualism. In The bilingualism reader (pp. 47-54). Routledge.
- Akçay, M. B., Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56-76.
- Xu, Y., Xu, H., Zou, J. (2020, May). Hgfm: A hierarchical grained and feature model for acoustic emotion recognition. In ICAS SP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6499-6503). IEEE.
- Lotfian, R., Busso, C. (2018). Predicting categorical emotions by jointly learning primary and secondary emotions through multitask learni ng. Interspeech 2018.
- Schmitt, M., Cummins, N., Schuller, B. (2019). Continuous emotion recognition in speech: do we need recurrence?.
- Kim, E., Shin, J. W. (2019, May). Dnn-based emotion recognition based on bottleneck acoustic features and lexical features. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6720-6724). IEEE.
- Xu, M., Zhang, F., Cui, X., Zhang, W. (2021, June). Speech emotion recognition with multiscale area attention and data augmentati on. In ICASSP ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6319-6323). IEEE.
- Mishra, S., Bhatnagar, N., P, P., T. R, S. (2024). Speech emotion recognition and classification using hybrid deep CNN and BiL STM model. Multimedia Tools and Applications, 83(13), 37603-37620.
- Islam, A., Foysal, M., Ahmed, M. I. (2024, April). Emotion Recognition from Speech Audio Signals using CNN-BiLSTM Hybrid Model. In 2024 3rd International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE) (pp. 1-6). IEEE.
- Macary, M., Tahon, M., Estève, Y., Rousseau, A. (2020, May). AlloSat: A new call center french corpus for satisfaction and frustration analysis. In Language Resources and Evaluation Conference, LREC 2020.
- Sun, L., Zou, B., Fu, S., Chen, J., Wang, F. (2019). Speech emotion recognition based on DNN-decision tree SVM model. Speech Communication, 115, 29-37.
- Singh, Y. B., Goel, S. (2022). A systematic literature review of speech emotion recognition approaches. Neurocomputing, 492, 245-263.
- Iben Nasr, L., Masmoudi, A., Hadrich Belguith, L. (2024). Survey on Arabic speech emotion recognition. International Journal of Speech Technology, 27(1), 53-68.
- Abdel-Hamid, L. (2020). Egyptian Arabic speech emotion recognition using prosodic, spectral and wavelet features. Speech Communication, 122, 19-30.
- Aljuhani, R. H., Alshutayri, A., Alahdal, S. (2021). Arabic speech emotion recognition from Saudi dialect corpus. IEEE Access, 9, 127081-127085.
- Cherif, R. Y., Moussaoui, A., Frahta, N., Berrimi, M. (2021). Efective speech emotion recognition using deep learning approaches for Algerian dialect. In 2021 international conference of women in data science at Taif University (WiDSTaif), 2021 (pp. 1-6). IEEE
- Meddeb, M., Hichem, K., Alimi, A. (2016). Automated extraction of features from Arabic emotional speech corpus. International Journal of Computer Information Systems and Industrial Management Applications, 8, 184-194.
- Messaoudi, A., Haddad, H., Hmida, M. B., Graiet, M. (2022, December). TuniSER: Toward a Tunisian Speech Emotion Recognition System. In Proceedings of the 5th International Conference on Natural Language and Speech Processing (ICNLSP 2022) (pp. 234-241).
- Harris, R. (2018). Continuity and change in the Tunisian Sahel. Routledge.
- Lajmi, D. (2009). Spécificités du dialecte Sfaxien. Synergies Tunisie, 1, 135-142.
- Habash, N., Soudi, A., & Buckwalter, T. (2007). On arabic transliteration. Arabic computational morphology: Knowledge-based and empirical methods, 15-22.
- Dammak, A. M. (2016). Approche hybride pour la reconnaissance automatique de la parole en langue arabe (Doctoral dissertation, Université du Maine).
- Jaitly, N., Hinton, G. E. (2013, June). Vocal tract length perturbation (VTLP) improves speech recognition. In Proc. ICML Workshop on Deep Learning for Audio, Speech and Language (Vol. 117, p. 21).
- Ko, T., Peddinti, V., Povey, D., & Khudanpur, S. (2015, September). Audio augmentation for speech recognition. In Interspeech (Vol. 2015, p. 3586).
- Akçay, M. B., Oğuz, K. (2020). Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication, 116, 56-76.
- Picone, J. W. (1993). Signal modeling techniques in speech recognition. Proceedings of the IEEE, 81(9), 1215-1247.
- Ibrahim, Y. A., Odiketa, J. C., Ibiyemi, T. S. (2017). Preprocessing technique in automatic speech recognition for human computer interaction: an overview. Ann Comput Sci Ser, 15(1), 186-191.
- Dhouib, A., Othman, A., El Ghoul, O., Khribi, M. K., Al Sinani, A. (2022). Arabic Automatic Speech Recognition: A Systematic Literature Review. Applied Sciences, 12(17), 8898.
- Hamid, O. K. (2018). Frame blocking and windowing speech signal. Journal of Information, Communication, and Intelligence Systems (JICIS), 4(5), 87-94.
- Nasr, L. I., Masmoudi, A., & Belguith, L. H. (2023, June). Natural Tunisian Speech Preprocessing for Features Extraction. In 2023 IEEE/ACIS 23rd International Conference on Computer and Information Science (ICIS) (pp. 73-78). IEEE. ACM Trans. Asian Low-Resour. Lang. Inf. Process.
- Stolar, M. N., Lech, M., Stolar, S. J., Allen, N. B. (2018). Detection of adolescent depression from speech using optimised spectral roll-off parameters. Biomedical Journal, 2, 10.
- Chen, A. (2014). Automatic classification of electronic music and speech/music audio content (Doctoral dissertation, University of Illinois at UrbanaChampaign).
- Dahmani, H., Hussein, H., Meyer-Sickendiek, B., Jokisch, O. (2019). Natural Arabic language resources for emotion recognition in Algerian dialect. In Arabic Language Processing: From Theory to Practice: 7th International Conference, ICALP 2019, Nancy, France, October 16-17, 2019, Proceedings 7 (pp. 18-33). Springer International Publishing.
- Shahin, I., Alomari, O. A., Nassif, A. B., Afyouni, I., Hashem, I. A., & Elnagar, A. (2023). An efficient feature selection method for Arabic and English speech emotion recognition using Grey Wolf Optimizer. Applied Acoustics, 205, 109279.
- El Seknedy, M., & Fawzi, S. A. (2022). Emotion recognition system for Arabic speech: Case study Egyptian accent. In International conference on model and data engineering, 2022 (pp. 102-115). Springer.
- Bahou, Y., Masmoudi, A., & Belguith, L. H. (2010, July). Traitement des disfluences dans le cadre de la compréhension automatique de l'oral arabe spontané. In Actes de la 17e conférence sur le Traitement Automatique des Langues Naturelles. Articles longs (pp. 201-210). View publication stats