Music Emotion Classification

description72 papers

group70 followers

lightbulbAbout this topic

Music Emotion Classification is the interdisciplinary study of identifying and categorizing the emotional content of music using computational methods, psychological theories, and music theory. It involves analyzing audio features, lyrics, and contextual factors to determine the emotions conveyed by musical pieces, facilitating applications in areas such as music recommendation systems and affective computing.

lightbulbAbout this topic

Key research themes

1. How can dynamic, dimensionally-annotated datasets and benchmarks advance the evaluation and development of music emotion recognition systems?

This research area focuses on creating large, publicly available datasets that provide continuous, time-dependent emotional annotations (primarily in valence-arousal dimensions) for musical excerpts, enabling standardized benchmarking of music emotion recognition (MER) methods. Such datasets tackle challenges of data scarcity, copyright restrictions, and inconsistent annotation schemes, offering a foundation for systematic comparison of feature sets and algorithms in MER. The theme is crucial for developing robust MER systems that capture temporal emotion variations in music and for fostering reproducibility and comparability across studies.

Developing a benchmark for emotional analysis of music

by Yi-hsuan Yang

2021, PloS one

Key finding: Introduced the MediaEval Database for Emotional Analysis in Music (DEAM), the largest dataset with continuous valence and arousal annotations at 2 Hz resolution over 1,802 Creative Commons songs, supporting dynamic music... Read more

articleView Paper downloadDownload

Music Emotion Classification: Dataset Acquisition and Comparative Analysis

by Rui Pedro Paiva

2021

Key finding: Created a sizable dataset of 903 audio clips labeled across five mood clusters aligned with MIREX standards, while using multiple audio feature extraction frameworks and support vector machine classifiers. Achieved an... Read more

articleView Paper downloadDownload

Cochleogram-based approach for detecting perceived emotions in music

by Maja Stella

2024, Information Processing & Management

Key finding: Proposed a biologically inspired cochlear modeling approach combined with convolutional neural networks to extract features from cochleogram images, aligning with human auditory perception. Evaluated on a public 1000-song... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

2. What are the effective machine learning approaches for multilabel and multimodal classification of emotions induced or perceived in music?

This theme investigates advanced machine learning methods capable of recognizing multiple simultaneous emotions in music, reflecting the complexity of human emotional responses. It encompasses multilabel classification paradigms, multimodal integration of audio and lyrics or video data, and the use of deep learning architectures like CNNs, LSTMs, and transformer models (e.g., XLNet). Addressing multilabel and multimodal approaches expands recognition accuracy and models emotional nuance, better reflecting real-world scenarios and enhancing applications such as music recommendation and emotion-based interaction.

Multilabel Automated Recognition of Emotions Induced Through Music

by Natalia Pichierri

2022

Key finding: Analyzed Geneva Emotional Music Scale 9 annotations in the Emotify dataset using several machine learning algorithms for multilabel and multiclass classification, emphasizing simultaneous emotions. Findings informed... Read more

articleView Paper downloadDownload

Multimodal music emotion recognition in Indonesian songs based on CNN-LSTM, XLNet transformers

by beei iaes

2022, Bulletin of Electrical Engineering and Informatics

Key finding: Developed a multimodal MER system using mel spectrograms with CNN-LSTM for audio and XLNet transformers for lyrics, combining outputs via stacking ensemble and ANN meta-classifier. Achieved state-of-the-art 80.56% accuracy on... Read more

articleView Paper downloadDownload

Proposing A Hybrid Approach for Emotion Classification using Audio and Video Data

by Naji Alobaidi

2022, 5th International Conference on Computer Science and Information Technology (CSTY 2019)

Key finding: Presented a hybrid emotion classification framework combining audio and video features extracted from the SAVEE database, using SVM for classification. The hybrid approach significantly improved accuracy to 99.26% compared to... Read more

articleView Paper downloadDownload

Emotion recognition and classification based on audio data using AI

by Сандугаш Бекенова

2025, E3S Web of Conferences

Key finding: Reviewed and applied AI algorithms including SVM, RNN, and CNN on audio features such as pitch and Mel-frequency cepstral coefficients, illustrating deep learning’s superiority in modeling sequential emotional patterns from... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

3. How do music structural elements and compositional eras influence perceived musical emotions, and how can computational models incorporate these insights?

This theme explores how intrinsic musical features (e.g., tempo, mode, pitch patterns) and historical changes across musical eras affect emotional perception. It investigates score-based analyses combined with perceptual evaluations to reveal changing cue associations (e.g., between major/minor modes and emotional valence/arousal) from Classical to Romantic periods. Integrating these musicological insights with computational models enhances generation of emotionally expressive music and improves classification by accounting for temporal and cultural factors shaping emotional meaning.

Exploring Changes in the Emotional Classification of Music between Eras

by Michael Schutz

2023, Auditory Perception & Cogntion

Key finding: Combined score-based acoustic cue analyses with behavioral classification of Bach and Chopin excerpts, revealing that Romantic era compositions alter associations between musical mode and affective meanings compared to... Read more

articleView Paper downloadDownload

EmotionBox: A music-element-driven emotional music generation system based on music psychology

by kaitong zheng

2023, Frontiers in Psychology

Key finding: Developed EmotionBox, a deep neural network system generating symbolic music guided by music elements tempo and mode derived from music psychology, mapped onto emotional valence-arousal dimensions without requiring labeled... Read more

articleView Paper downloadDownload

Music Emotion Classification

by Nikolaos (Nikos) Nikolaou

2016, M.Eng. Thesis, Technical University of Crete

Key finding: Investigated various feature sets including audio signal processing, chord features, and EEG data to classify music emotion in valence-activation space. Found that combining music-inspired features, frequency modulation... Read more

articleView Paper downloadDownload

keyboard_arrow_downShow more

All papers in Music Emotion Classification

Emotion recognition in human-computer interaction

by Stefanos Kollias

2001, IEEE Signal Processing Magazine

descriptionView Paper arrow_downwardDownload

Music Emotion Classification

by Nikolaos (Nikos) Nikolaou

2011, M.Eng. Thesis, Technical University of Crete

In this thesis we focus on the automatic emotion classification of music samples. We extract a set of features from the music signal and examine their discriminatory capability using various classification techniques. Our goal is to determine the features and the classification methods that lead to the best classification of the emotion a music sample conveys. During the course of the thesis, we generated
our own dataset of annotated song samples and we examined two distinct methods of describing an emotion: using clusters consisting of various emotional states, and using a two-dimensional representation of the emotion in the Valence-Activation plane. The latter method was chosen as the most successful. We also tried other approaches of music emotion classification (MEC) as well, such as treating the song sample as an amplitude and frequency modulated (AM-FM) signal, on which we subsequently perform multiband demodulation analysis (MDA) testing various Gabor filter banks (Mel scale-based filter bank, Bark scale-based filter bank, and a number of fractional octave-based filter banks). Statistics of the Frequency Modulation Percentages (FMPs) of each band derived from the demodulation, proved to be quite successful features in the classification of emotion. Finally, we
explored other modalities besides the music sound signal itself, such as a number of features derived from the chords of the song samples, classification of the song samples' lyrics using various techniques and a brief investigation of Electroencephalogram (EEG) data generated by one of the annotators
while performing the annotation of the song samples. Our final feature-pack included a combination of the most successful features among the ones we studied: (i) music-inspired features (features based on music theory and psychoacoustics, derived from either the sound signal or the chords of the sample), (ii) statistics of the FMPs and (iii) statistics of the Mel-frequency cepstral coefficients (MFCCs). This feature-pack proved to be more robust than its three individual components and in the end we achieved results that reached 85.7% correct classification rate in the dimension of Valence and 85.1% correct classification rate in the dimension of Activation. We finally demonstrate that by discarding training samples that are assigned a label too close to the neutral value, our results can improve even further, especially in the dimension of Activation.

descriptionView Paper arrow_downwardDownload

Investigation of the relationships between audio features and induced emotions in Contemporary Western music

by Konstantinos Trochidis

This paper focuses on emotion recognition and understanding in Contemporary Western music. The study seeks to investigate the relationship between perceived emotion and musical features in the fore-mentioned musical genre. A set of 27... more

descriptionView Paper arrow_downwardDownload

Modeling Music Emotion Judgments Using Machine Learning Methods

by Frank Russo

Emotion judgments and five channels of physiological data were obtained from 60 participants listening to 60 music excerpts. Various machine learning (ML) methods were used to model the emotion judgments inclusive of neural networks,... more

descriptionView Paper arrow_downwardDownload

Body and Soul: Exploring the Relationship between Physical Expression and Emotional Response in Musical Worship

by Megan Ng

Abstract This project will investigate the link between physical expression and emotions in music. Since the emotive nature of music is already well known, it will focus on the following questions: 1. Is physical expression a natural... more

descriptionView Paper arrow_downwardDownload

MULTI-LABEL CLASSIFICATION OF MUSIC INTO EMOTIONS

by Konstantinos Trochidis

2008, ISMIR 2008: proceedings …

In this paper, the automated detection of emotion in music is modeled as a multilabel classification task, where a piece of music may belong to more than one class. Four algorithms are evaluated and compared in this task. Furthermore, the... more

4.1.2. Timbre Features Figure 1 shows another emotion model, called Thayer’s model of mood [12], which consists of 2 axes. The horizon- tal axis described the amount of stress and the vertical axis the amount of energy. Figure 1. Thayer’s model of mood

Figure 2. The Tellegen-Watson-Clark model of mood (fig- ure reproduced from [18])

In order to evaluate our hypothesis we compare the Ham- ming loss of the MLANN algorithm (known to suffer from the curse of dimensionality) using the best 1 to 71 features according to the three feature selection approaches using the x? statistic. Figure 3 shows the results.

The Tellegen-Watson-Clark model was employed for label- ing t he data with emotions. We decided to use this par- ticular model because the emotional space of music is ab- stract with many emotions and a music application based on mood should combine a series of moods and emotions. To achieve this goal without using an excessive number of la- bels, tiona we reached a compromise retaining only 6 main emo- clusters from this model. The corresponding labels are presented in Table 1.

Table 2. Performance results Table 2 shows the predictive performance of the 4 compet- ing multilabel classification algorithms using a variety of measures. We notice that RAKEL dominates the other al- gorithms in almost all measures.

Table 3. Average training, parameter selection and testing cpu time in seconds the multiple models that it builds, since it is an ensemble method. RAKEL further requires a comparatively signifi- cant amount of time for parameter selection. However, this time is still affordable (2.5 minutes), as it is only run offline.

Ta ble 4 shows the classification accuracy of the algo- rithms for each label (as if they were independently pre- dicted), along with the average accuracy in the last column. We notice that based on the ease of predictions we can rank the lal bels in the following descending order L4, L6, L5, L1, L3, L2. L4 is the easiest with a mean accuracy of approx- imate racies y 87%, followed by L6, L5 and L1 with mean accu- of approximately 80%, 79% and 78% respectively. The hardest labels are L2 and L3 with a mean accuracy of appro ximately 73% and 76% respectively. Table 4. Accuracy per label

descriptionView Paper arrow_downwardDownload

Fusion of electroencephalographic dynamics and musical contents for estimating emotional responses in music listening

by Yi-hsuan Yang

2014, Frontiers in Neuroscience

Electroencephalography (EEG)-based emotion classification during music listening has gained increasing attention nowadays due to its promise of potential applications such as musical affective brain-computer interface (ABCI),... more

FIGURE 1 | Electrode placements of 32 channels according to the international 10-20 system. For each of 16 30-s EEG trials, the short-time Fourier trans- form with non-overlapping 1-s Hamming window was applied

FIGURE 2 | The valence and arousal classification results using the subject-dependent EEG feature sets with/without the F-score basec feature selection. The numbers above the bars represent the mean values of the results, whereas the numbers in bold indicate the accuracies significantly better (jp < 0.01) than the majority voting accuracy (valence: ~63%, arousal: ~61%). ‘Indicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01).

FIGURE 3 | The valence and arousal classification results using the subject-dependent multimodal approach with/without feature selection. The results of the subject-dependent EEG modality (feature type: MESH) and the music modality (feature type: MUSIC) are also provided for comparison. The numbers above the bars represent the mean values of the results, whereas the numbers in bold indicate the accuracies significantly better (jp < 0.01) than the majority voting accuracy (valence: ~63%, arousal: ~61%). ‘Indicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01).

-IGURE 4 | The percent composition of contributions of EEG (DLAT, DCAU, and PSD) and musical (Pitch, Dissonance, Loudness, and MFCC) features o the subject-dependent multimodality. The composition of the subject-dependent EEG modality is also provided for comparison.

the nodes represent the mean values of the results. tIndicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01), yet were comparable (p > 0.1) to majority voting accuracies (valence: ~63%, arousal: ~61%). FIGURE 5 | The valence and arousal classification results of the subject-independent EEG features (type: MESH) in term of the average number of features, electrodes, and accuracies using with/without feature selection under the LFI criteria (0.1 ~ 0.6). The numbers near to

FIGURE 6 | The topographic mapping of informative EEG features consistently appeared in multiple subjects. The rightmost topography colorcodes the importance of electrodes according to how frequent the electrodes were used to derive the corresponding features.

mean values of the results, whereas the numbers in bold indicate the accuracies significantly better (jp < 0.02) than the majority voting accuracy (valence: ~63%, arousal: ~61%). Tindicates that the accuracy with feature selection significantly outperformed that without feature selection (p < 0.01). FIGURE 7 | The valence and arousal classification results using the subject-independent multimodal approach (LFI = 0.6) with/without feature selection. The results of the subject-independent EEG modality (feature type: MESH) and the music modality (feature type: MUSIC) are also provided for comparison. The numbers above the bars represent th

FIGURE 8 | The percent composition of contributions of EEG (DLAT, DCAU, and PSD) and musical (Pitch, Dissonance, Loudness, and MFCC) features to the subject-independent multimodality. The composition of the subject-independent EEG modality is also provided for comparison.

Table 1 | A summary of EEG feature types.

Table 3 | The informative EEG features that consistently appeared across multiple subjects. competed to the EEG features and replaced the ones with rela- tively low discriminative power, especially for arousal scale. This evidently explains the reason that the subject-independent mul- timodal approach leading to significant improvements upon the subject-independent EEG results. Table 4 lists these informative musical features, which consistently appeared in above half of the subjects.

Table 4 | The informative musical features in the subject-independent multimodal approach.

descriptionView Paper arrow_downwardDownload

Electroacoustic Music with Video: Comparison with Sound for Film

by Alice Shields

2016, New Music Box

Electroacoustic music with video is now frequently programmed in classical concert venues. There are many different ways in which video and sound can be organized together into an effective work of art. But after attending a number of... more

descriptionView Paper arrow_downwardDownload

Affect Doktrini ve Temellerininin Dayandığı Müzik - Duygu İlişkisi Çalışmaları Üzerine Bir Araştırma

by Barış Can Bilgin

Müzik kavramı bir lisanı, bu lisanın alfabesini ve bu dilin edebiyatını ifade eder. Tıpkı birincil anlamı ile akla gelen edebiyatta olduğu gibi, müzikte de bir olgu farklı biçimler ve farklı edebi kurulumlarla ifade edilebilir. Müziğin... more

descriptionView Paper arrow_downwardDownload

Classification of emotions induced by horror and relaxing movies using single-channel EEG recordings

by Amir Rastegarnia and

2020, International Journal of Electrical and Computer Engineering (IJECE)

It has been observed from recent studies that corticolimbic Theta rhythm from EEG recordings perceived as fear or threatening scene during neural processing of visual stimuli. In additions, neural oscillations’ patterns in Theta, Alpha... more

descriptionView Paper arrow_downwardDownload

Measuring Customer Behavior with Deep Convolutional Neural Networks

by Academia EduSoft

2016, BRAIN. Broad Research in Artificial Intelligence and Neuroscience

In this paper we propose a neural network model for human emotion and gesture classification. We demonstrate that the proposed architecture represents an effective tool for real-time processing of customer's behavior for distributed on-land systems, such as information kiosks, automated cashiers and ATMs. The proposed approach combines most recent biometric techniques with the neural network approach for real-time emotion and behavioral analysis. In the series of experiments, emotions of human subjects were recorded, recognized, and analyzed to give statistical feedback of the overall emotions of a number of targets within a certain time frame. The result of the study allows automatic tracking of user's behavior based on a limited set of observations. 1. Introduction Recognition of human behavior can be most efficiently achieved by visually detecting facial features and specific body movements, such as gestures. Using computer vision and machine learning algorithms for processing these features, recorded by infrared cameras, we can classify emotional states and behavioral patterns of multiple targets. The aim of this paper is to provide statistical observations and measurements of human behavior during the standard interaction with a user interface of a commonly used software. For academic purposes, we have chosen a very limited number of emotional states and behavioral patterns by studying only type of such standard interaction: the interaction of a user with typical ATM equipment, since it provides us with very distinctive patterns of 'typical' and 'non-typical' behavior and facial expressions. During this study, we observed the behavior of human subjects during standard interaction with the ATM versus non-standard interaction. Automated analysis of these behaviors with the machine learning techniques allowed us to train a complex convolutional neural network (CNN) to classify behavior of a user by classification both body movements and facial features. Such a feedback can provide important measures for user response to an interaction with any chosen system with a limited number of gestures involved. We use infrared cameras to automatically detect features and the movements of the limbs in order to classify user behavior into typical or untypical for the kind of task he is performing. We restrict ourselves to only one type of interaction; however, this kind of classification task is very useful in the number of applications, where the number of gestures of the human is limited, such as: • Customers at the various types of automated machines. For this category of users, the algorithm can be used for detection of unusual/fraudulent behavior to decrease the workload of the closed-circuit television (CCTV), or video surveillance, operators who monitor users of these machines: o customer at the ATM machine; o customer at the ticket machine in the underground; o customer at the automated cashier in the countries, where such payment type is widely used; • Drivers. For this category of users, the algorithm can be used for detection of dangerous actions and preventing the unwanted consequences, such as sleeping, loss of attention etc.: o train driver in the train line/underground; o track driver; o automobile driver;

descriptionView Paper arrow_downwardDownload

EMOTIONAL TELUGU SPEECH SIGNALS CLASSIFICATION BASED ON K-NN CLASSIFIER

by Editor IJRET

Speech processing is the study of speech signals, and the methods used to process them. In application such as speech coding, speech synthesis, speech recognition and speaker recognition technology, speech processing is employed. In... more

descriptionView Paper arrow_downwardDownload

Speaker dependency of spectral features and speech production cues for automatic emotion classification

by Vidhyasaharan Sethu

2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Spectral and excitation features, commonly used in automatic emotion classification systems, parameterise different aspects of the speech signal. This paper groups these features as speech production cues, broad spectral measures and... more

descriptionView Paper arrow_downwardDownload

Structural and Playback Issues in Current Electroacoustic Music

by Alice Shields

2016, New Music Box

Many current electroacoustic works are weakened by not having clear structure. Why is this happening, and why now? In the following article on New Music Box I consider a number of reasons why this may be occurring. I begin with... more

descriptionView Paper arrow_downwardDownload

Emotion recognition in human-computer interaction

by Nicolas Tsapatsoulis

2001, IEEE Signal Processing Magazine

descriptionView Paper arrow_downwardDownload

Multilabel classification of music into emotions

by George Kalliris

2008, … on Music Information …

descriptionView Paper arrow_downwardDownload

EEG-based emotion classification using deep belief networks

by Wei-Long Zheng

In recent years, there are many great successes in using deep architectures for unsupervised feature learning from data, especially for images and speech. In this paper, we introduce recent advanced deep learning models to classify two... more

descriptionView Paper arrow_downwardDownload

What really matters? A study into people's instinctive evaluation metrics for continuous emotion prediction in music

by Vaiva Imbrasaite and

Continuous emotion prediction in the arousal-valence space is now being used in various modalities: music, facial expressions, gestures, text, etc. In order to be able to compare the work of different research groups effectively, we... more

descriptionView Paper arrow_downwardDownload

Real time facial expression recognition in video using support vector machines

by Rana Kaliouby

2003

Enabling computer systems to recognize facial expressions and infer emotions from them in real time presents a challenging research topic. In this paper, we present a real time approach to emotion recognition through facial expression in... more

descriptionView Paper arrow_downwardDownload

Emotion recognition in human-computer interaction

by Nicolas Tsapatsoulis and

2001, IEEE Signal Processing Magazine

descriptionView Paper arrow_downwardDownload

Comparison of perceptual features efficiency for automatic identification of emotional states from speech

by Adam Pelikant

2013, 2013 6th International Conference on Human System Interactions, HSI 2013

The following paper presents parameterization of emotional speech using perceptual coefficients as well as a comparison of Mel Frequency Cepstral Coefficients (MFCC), Bark Frequency Cepstral Coefficients (BFCC), Perceptual Linear... more

descriptionView Paper arrow_downwardDownload

On the comparison of classifiers’ performance in emotion classification: Critiques and suggestions

by Halis Altun

2008, 2008 IEEE 16th Signal Processing, Communication and Applications Conference

Özetçe Literatürde birbirinden farklı sınıflandırma algoritmalarının belirli bir problem için performanslarının karşılaştırılması oldukça yaygın bir uygulama olarak karşımıza çıkmaktadır. Ancak bu çalışmalardan elde edilen sonuçların... more

descriptionView Paper arrow_downwardDownload

On the comparison of classifiers’ performance in emotion classification: Critiques and suggestions

by Halis Altun

2008

descriptionView Paper arrow_downwardDownload

Multi-label classification of music by emotion

by Ioannis Vlahavas

2011, EURASIP Journal on Audio, Speech, and Music Processing

descriptionView Paper arrow_downwardDownload

Emotion Recognition Using PHOG and LPQ features

by Akshay Asthana

ise.canberra.edu.au

descriptionView Paper arrow_downwardDownload

Testing the Efficiency of Markov Chain Monte Carlo With People Using Facial Affect Categories

by Adam Sanborn

2012, Cognitive Science

Exploring how people represent natural categories is a key step toward developing a better understanding of how people learn, form memories, and make decisions. Much research on categorization has focused on artificial categories that are... more

descriptionView Paper arrow_downwardDownload

Dimensional Affect Recognition using Continuous Conditional Random Fields

by Tadas Baltrusaitis

During everyday interaction people display various non-verbal signals that convey emotions. These signals are multi-modal and range from facial expressions, shifts in posture, head pose, and non-verbal speech. They are subtle, continuous... more

descriptionView Paper arrow_downwardDownload

Measuring Customer Behavior with Deep Convolutional Neural Networks

by Veaceslav Albu

2016

In this paper, we propose a neural network model for human emotion and gesture classification. We demonstrate that the proposed architecture represents an effective tool for real-time processing of customer's behavior for distributed... more

Figure I. CNN architecture (adopted from Krizhevsky et al. ’12) The architecture of a CNN can be described as following. A small pixel region goes to input neurons and then connects to a first convolution hidden layer (Figurel). There we can see a set of learnable filters, which are activated during the presentation some particular type of feature in pixel region in the input. On this phase, CNN does shift invariance, which is carried by feature map. Subsampling layer goes next. There we have two processes: local averaging and sampling. As a result, we get declining resolution of feature map. To correspond this task CNN needs supervised learning. Before starting the experiment, we gave a set of labeled videos with different emotional experience. The system analyses images and finds similar features. Then the system creates a map, where it arranges videos in accordance with similar features. Thereby, images with similar emotions form certain class. To test the system, we add other videos and correct the system when it refers them improperly.

descriptionView Paper arrow_downwardDownload

PROPOSING A HYBRID APPROACH FOR EMOTION CLASSIFICATION USING AUDIO AND VIDEO DATA

by Computer Science & Information Technology (CS & IT) Computer Science Conference Proceedings (CSCP)

Emotion recognition has been a research topic in the field of Human Computer Interaction (HCI) during recent years. Computers have become an inseparable part of human life. Users need human-like interaction to better communicate with... more

descriptionView Paper arrow_downwardDownload

Music emotion classification for Turkish songs using lyrics

by Abide Coskun-Setirek

2018, Pamukkale University Journal of Engineering Sciences

Music has grown into an important part of people's daily lives. As we move further into the digital age in which a large collection of music is being created daily and becomes easily accessible renders people to spend more time on... more