Evaluating Deep Learning Models for Music Emotion Recognition

Sneha Thombre

Outline

Evaluating Deep Learning Models for Music Emotion Recognition

International Journal of Engineering Applied Sciences and Technology

Abstract

Music listening helps people not only for entertainment, but also to reduce emotional stress in their daily lives. People nowadays tend to use online music streaming services such as Spotify, Amazon Music, Google Play Music, etc. rather than storing the songs on their devices. The songs in these streaming services are categorized into different emotional labels such as happy, sad, romantic, devotional, etc. In the music streaming applications, the songs are manually tagged with their emotional categories for music recommendation. Considering the growth of music on different social media platforms and the internet, the need for automatic tagging will increase in coming time. The work presented deals with training the deep learning model for automatic emotional tagging. It covers implementation of two different deep learning architectures for classifying the audio files using the Mel-spectrogram of music audio. The first architecture proposed is Convolutional Recurrent Model (CRNN) an...

References (46)

models. In International symposium on computer music modeling and retrieval (pp. 228-252). Springer, Berlin, Heidelberg. http://dx.doi.org/10.1007/978-3- 642-41248-6_13
Wieczorkowska, A., Synak, P., & Raś, Z. W. (2006). Multi-label classification of emotions in music. In Intelligent Information Processing and Web Mining, 307-315.
Springer, Berlin, Heidelberg. http://dx.doi.org/10.1007/3-540-33521-8_30
Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern recognition, 40(7), 2038-2048.
http://dx.doi.org/10.1016/j.patcog.2006.12.019
Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Random k-labelsets for multilabel classification. IEEE Transactions on Knowledge and Data Engineering, 23(7), 1079-1089.
http://dx.doi.org/10.1109/TKDE.2010.164
Fürnkranz, J., Hüllermeier, E., Mencía, E. L., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine learning, 73(2), 133- 153. http://dx.doi.org/10.1007/s10994-008-5064-8
Mehrabian, A., & Russell, J. A. (1974). An approach to environmental psychology. the MIT Press.
Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of new music research, 33(3), 217-238. http://dx.doi.org/10.1080/0929821042000317813
An, Y., Sun, S., & Wang, S. (2017, May). Naive Bayes classifiers for music emotion classification based on lyrics. In 2017 IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS), 635-638. IEEE. http://dx.doi.org/10.1109/ICIS.2017.7960070
Misron, M. M., Rosli, N., Manaf, N. A., & Halim, H. A. (2014). Music emotion classification (mec): exploiting vocal and instrumental sound features. In Recent Advances on Soft Computing and Data Mining, 539-549.
Springer, Cham. http://dx.doi.org/10.1007/978-3-319-07692-8_51
Logan, B. (2000). Mel frequency cepstral coefficients for music modeling. In Ismir (Vol. 270) 1-11.
Schmidt, E. M., Turnbull, D., & Kim, Y. E. (2010). Feature selection for content-based, time-varying musical emotion regression. In Proceedings of the international conference on Multimedia information retrieval, 267-274.
http://dx.doi.org/10.1145/1743384.1743431
Eyben, F., Salomão, G. L., Sundberg, J., Scherer, K. R., & Schuller, B. W. (2015). Emotion in the singing voice-a deeperlook at acoustic features in the light of automatic classification. EURASIP Journal on Audio, Speech, and Music Processing, 2015(1), 19. http://dx.doi.org/10.1186/s13636-015-0057-6
Tiwari, V. (2010). MFCC and its applications in speaker recognition. International journal on emerging technologies, 1(1), 19-22.
Liu, X., Chen, Q., Wu, X., Liu, Y., & Liu, Y. (2017). CNN based music emotion classification. arXiv preprint arXiv:1704.05665.
Kim, Y. E., Schmidt, E. M., Migneco, R., Morton, B. G., Richardson, P., Scott, J., & Turnbull, D. (2010, August). Music emotion recognition: A state of the art review. In Proceedings ISMIR (Vol. 86), 937-952.
Li, T. L., Chan, A. B., & Chun, A. H. (2010). Automatic musical pattern feature extraction using convolutional neural network. Genre, 10, 1x1.
Ciresan, D. C., Meier, U., Masci, J., Gambardella, L. M., & Schmidhuber, J. (2011, June). Flexible, high performance convolutional neural networks for image classification. In Twenty-Second International Joint Conference on Artificial Intelligence.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778.
http://dx.doi.org/10.1109/CVPR.2016.90
Juthi, J. H., Gomes, A., Bhuiyan, T., & Mahmud, I. (2020). Music Emotion Recognition with the Extraction of Audio Features Using Machine Learning Approaches. In Proceedings of ICETIT 2019, 318-329.
Springer, Cham. http://dx.doi.org/10.1007/978-3-030- 30577-2_27
Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2017). Convolutional recurrent neural networks for music classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2392-2396. IEEE. http://dx.doi.org/10.1109/ICASSP.2017.7952585
Feng, L., Liu, S., & Yao, J. (2017). Music genre classification with paralleling recurrent convolutional neural network. arXiv preprint arXiv:1712.08370.
Zhao, J., Mao, X., & Chen, L. (2019). Speech emotion recognition using deep 1D & 2D CNN LSTM networks. Biomedical Signal Processing and Control, 47, 312-323.
http://dx.doi.org/10.1016/j.bspc.2018.08.035
Lim, W., Jang, D., & Lee, T. (2016). Speech emotion recognition using convolutional and recurrent neural networks. In 2016 Asia-Pacific signal and information processing association annual summit and conference (APSIPA), 1-4. IEEE. http://dx.doi.org/10.1109/APSIPA.2016.7820699
Satt, A., Rozenberg, S., & Hoory, R. (2017). Efficient Emotion Recognition from Speech Using Deep Learning on Spectrograms. In Inter-speech, 1089-1093. http://dx.doi.org/10.21437/Interspeech.2017-200
Fan, J., Thorogood, M., & Pasquier, P. (2018). Soundscape emotion recognition via deep learning. In Proceedings of the Sound and Music Computing.
Zhou, J., Chen, X., & Yang, D. (2019). Multimodel Music Emotion Recognition Using Unsupervised Deep Neural Networks. In Proceedings of the 6th Conference on Sound and Music Technology (CSMT). 27-39.
Springer, Singapore. http://dx.doi.org/10.1007/978-981- 13-8707-4_3
Sharma, S. K. and Sharma, N. K. (2019). Text Classification using LSTM based Deep Neural Network Architecture. International Journal on Emerging Technologies, 10(4): 38-42.
Sharma, K. and Bhatia, M. (2020). Deep Learning in Pandemic States: Portrayal. International Journal on Emerging Technologies,11(3):462-467.
Velankar, M., Kotian, R., & Kulkarni, P. (2021). Contextual Mood Analysis with Knowledge Graph Representation for Hindi Song Lyrics in Devanagari Script. arXiv preprint arXiv:2108.06947.
Velankar, M., Khatavkar, V., & Kulkarni, P. (2020). Multimodal Sentiment Analysis of Nursery Rhymes for Behavior Improvement of Children. JUSST, 22(12).
Velankar, M., Deshpande, A. & Kulkarni, P. (2020). 3 Application of Machine Learning in Music Analytics. In R. Das, S. Bhattacharyya & S. Nandy (Ed.), Machine Learning Applications: Emerging Trends (pp. 43-64).
De Gruyter. https://doi.org/10.1515/9783110610987- 005
Velankar, M., & Kulkarni, P. (2018). Soft computing for music analytics. International Journal of Engineering Applied Sciences and Technology, 3(2).
Liu, H., Fang, Y., & Huang, Q. (2019). Music emotion recognition using a variant of recurrent neural network. In 2018 International Conference on Mathematics, Modeling, Simulation and Statistics Application (MMSSA 2018). Atlantis Press. http://dx.doi.org/10.2991/mmssa-18.2019.4
Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1), 1929-1958.
Cho, K., Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.12

Evaluating Deep Learning Models for Music Emotion Recognition

Sign up for access to the world's latest research

Abstract

Related papers

References (46)

Related papers

Related topics