Sharifa Alghowinem

From Joyous to Clinically Depressed: Mood Detection Using Multimodal Analysis of a Person's Appearance and Speech

2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

Depression and other mood disorders are common and disabling disorders. We present work towards a... more Depression and other mood disorders are common and disabling disorders. We present work towards an objective diagnostic aid supporting clinicians using affective sensing technology with a focus on acoustic and statistical features from spontaneous speech. This work investigates differences in expressing positive and negative emotions in depressed and healthy control subjects as well as whether initial gender classification increases the recognition rate. To this end, spontaneous speech from interviews of 30 subjects of each depressed and controls was analysed, with a focus on questions eliciting positive and negative emotions. Using HMMs with GMMs for classification with 30-fold cross-validation, we found that MFCC, energy and intensity features gave highest recognition rates when female and male subjects were analysed together. When the dataset was first split by gender, root mean square energy and shimmer features, respectively, were found to give the highest recognition rates in females, while it was voice quality for males. Overall, correct recognition rates from acoustic features for depressed female subjects were higher than for male subjects. Using temporal features, we found that the response time and average syllable duration were longer in depressed subjects, while the interaction involvement and articulation rate wesre higher in control subjects.

Download

Head Pose and Movement Analysis as an Indicator of Depression

2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

Depression is a common and disabling mental health disorder, which impacts not only on the suffer... more Depression is a common and disabling mental health disorder, which impacts not only on the sufferer but also their families, friends and the economy overall. Our ultimate aim is to develop an automatic objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. Here, we analyse the performance of head pose and movement features extracted from face videos using a 3D face model projected on a 2D Active Appearance Model (AAM). In a binary classification task (depressed vs. non-depressed), we modelled low-level and statistical functional features for an SVM classifier using real-world clinically validated data. Although the head pose and movement would be used as a complementary cue in detecting depression in practice, their recognition rate was impressive on its own, giving 71.2% on average, which illustrates that head pose and movement hold effective cues in diagnosing depression. When expressing positive and negative emotions, recognising depression using positive emotions was more accurate than using negative emotions. We conclude that positive emotions are expressed less in depressed subjects at all times, and that negative emotions have less discriminatory power than positive emotions in detecting depression. Analysing the functional features statistically illustrates several behaviour patterns for depressed subjects: (1) slower head movements, (2) less change of head position, (3) longer duration of looking to the right, (4) longer duration of looking down, which may indicate fatigue and eye contact avoidance. We conclude that head movements are significantly different between depressed patients and healthy subjects, and could be used as a complementary cue.

Download

Multimodal assistive technologies for depression diagnosis and monitoring

by Sharifa Alghowinem and Roland Goecke

Journal on Multimodal User Interfaces, 2013

Depression is a severe mental health disorder with high societal costs. Current clinical practice... more Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework's effectiveness in depression analysis.

Download

Cross-cultural detection of depression from nonverbal behaviour

2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015

An exploratory study of detecting emotion states using eye-tracking technology

Studying eye movement has proven to be useful in the study of detecting and understanding human e... more Studying eye movement has proven to be useful in the study of detecting and understanding human emotional states. This paper aims to investigate eye movement features: pupil size, time of first fixation, first fixation duration, fixation duration and fixation count in clips emotional stimulation. Thirty seven subjects' pupil responses were measured while watching two pleasant and unpleasant emotional clips. The results showed that the fixation duration and fixation count significantly different between pleasant and unpleasant clip arousal. These results suggest that the measurement of eye fixation may be a potentially useful computer input for detecting positive and negative emotional state.

Download

Characterising depressed speech for classification

Depression is a serious psychiatric disorder that affects mood, thoughts, and the ability to func... more Depression is a serious psychiatric disorder that affects mood, thoughts, and the ability to function in everyday life. This paper investigates the characteristics of depressed speech for the purpose of automatic classification by analysing the effect of different speech features on the classification results. We analysed voiced, unvoiced and mixed speech in order to gain a better understanding of depressed speech and to bridge the gap between physiological and affective computing studies. This understanding may ultimately lead to an objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. The characteristics of depressed speech were statistically analysed using ANOVA and linked to their classification results using GMM and SVM. Features were extracted and classified over speech utterances of 30 clinically depressed patients against 30 controls (both gender-matched) in a speaker-independent manner. Most feature classification results were consistent with their statistical characteristics, providing a link between physiological and affective computing studies. The classification results from low-level features were slightly better than the statistical functional features, which indicates a loss of information in the latter. We found that both mixed and unvoiced speech were as useful in detecting depression as voiced speech, if not better.

Download

A comparative study of different classifiers for detecting depression from spontaneous speech

by Gordon Parker, Sharifa Alghowinem, and Roland Goecke

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Accurate detection of depression from spontaneous speech could lead to an objective diagnostic ai... more Accurate detection of depression from spontaneous speech could lead to an objective diagnostic aid to assist clinicians to better diagnose depression. Little thought has been given so far to which classifier performs best for this task. In this study, using a 60-subject real-world clinically validated dataset, we compare three popular classifiers from the affective computing literature -Gaussian Mixture Models (GMM), Support Vector Machines (SVM) and Multilayer Perceptron neural networks (MLP) -as well as the recently proposed Hierarchical Fuzzy Signature (HFS) classifier. Among these, a hybrid classifier using GMM models and SVM gave the best overall classification results. Comparing feature, score, and decision fusion, score fusion performed better for GMM, HFS and MLP, while decision fusion worked best for SVM (both for raw data and GMM models). Feature fusion performed worse than other fusion methods in this study. We found that loudness, root mean square, and intensity were the voice features that performed best to detect depression in this dataset.

Download

Detecting depression: A comparison between spontaneous and read speech

by Gordon Parker, Sharifa Alghowinem, and Roland Goecke

2013 IEEE International Conference on Acoustics, Speech and Signal Processing, 2013

Major depressive disorders are mental disorders of high prevalence, leading to a high impact on i... more Major depressive disorders are mental disorders of high prevalence, leading to a high impact on individuals, their families, society and the economy. In order to assist clinicians to better diagnose depression, we investigate an objective diagnostic aid using affective sensing technology with a focus on acoustic features. In this paper, we hypothesise that (1) classifying the general characteristics of clinical depression using spontaneous speech will give better results than using read speech, (2) that there are some acoustic features that are robust and would give good classification results in both spontaneous and read, and (3) that a 'thin-slicing' approach using smaller parts of the speech data will perform similarly if not better than using the whole speech data. By examining and comparing recognition results for acoustic features on a real-world clinical dataset of 30 depressed and 30 control subjects using SVM for classification and a leave-one-out cross-validation scheme, we found that spontaneous speech has more variability, which increases the recognition rate of depression. We also found that jitter, shimmer, energy and loudness feature groups are robust in characterising both read and spontaneous depressive speech. Remarkably, thin-slicing the read speech, using either the beginning of each sentence or the first few sentences performs better than using all reading task data.

Download

AusTalk — The Australian speech database: Design framework, recording experience and localisation

2013 8th International Conference on Information Technology in Asia (CITA), 2013

Aiming to create a comprehensive Australian speech database, the "AusTalk" project was carefully ... more Aiming to create a comprehensive Australian speech database, the "AusTalk" project was carefully designed by 30 speech scientists contributing their disciplinary expertise. Standardised three one-hour audio-visual sessions for each of 1000 speakers around Australia were recorded having diverse components suitable for different research areas. The design of this database provides a good framework for any speech data corpus collection. In this paper, we present the AusTalk design and recording protocol, as well as problems faced and lessons learned. Localisation of this protocol and the potential customisation based on other countries' specifications are discussed. Collecting such speech databases including accent groups is encouraged to boost speech research in areas such as linguistics, speech and speaker recognition, forensic voice comparison, auditory-visual speech processing and many more.

Download

A Computationally Efficient Fuzzy Logic Parameterisation System for Computer Games

Lecture Notes in Computer Science, 2011

ABSTRACT Linguistic fuzzy expert systems provide useful tools for the implementation of Artificia... more ABSTRACT Linguistic fuzzy expert systems provide useful tools for the implementation of Artificial Intelligence (AI) systems for computer games. However, in games where a large number of fuzzy agents are needed, the computational needs of the fuzzy expert system inclines designers to abandon this promising technique in favour of non-fuzzy AI techniques with a lower computational overhead. In this paper we investigated a parameterisation of fuzzy sets with the goal of finding fuzzy systems that have lower computational needs but still have sufficient accuracy for use in the domain of computer games. We developed a system we call short-cut fuzzy logic that has low computational needs and seems to have adequate accuracy for the games domain.

From Joyous to Clinically Depressed: Mood Detection Using Multimodal Analysis of a Person's Appearance and Speech

by Gordon Parker, Sharifa Alghowinem, and Roland Goecke

2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

Depression and other mood disorders are common and disabling disorders. We present work towards a... more Depression and other mood disorders are common and disabling disorders. We present work towards an objective diagnostic aid supporting clinicians using affective sensing technology with a focus on acoustic and statistical features from spontaneous speech. This work investigates differences in expressing positive and negative emotions in depressed and healthy control subjects as well as whether initial gender classification increases the recognition rate. To this end, spontaneous speech from interviews of 30 subjects of each depressed and controls was analysed, with a focus on questions eliciting positive and negative emotions. Using HMMs with GMMs for classification with 30-fold cross-validation, we found that MFCC, energy and intensity features gave highest recognition rates when female and male subjects were analysed together. When the dataset was first split by gender, root mean square energy and shimmer features, respectively, were found to give the highest recognition rates in females, while it was voice quality for males. Overall, correct recognition rates from acoustic features for depressed female subjects were higher than for male subjects. Using temporal features, we found that the response time and average syllable duration were longer in depressed subjects, while the interaction involvement and articulation rate wesre higher in control subjects.

Download

Head Pose and Movement Analysis as an Indicator of Depression

by Gordon Parker, Sharifa Alghowinem, and Roland Goecke

2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

Depression is a common and disabling mental health disorder, which impacts not only on the suffer... more Depression is a common and disabling mental health disorder, which impacts not only on the sufferer but also their families, friends and the economy overall. Our ultimate aim is to develop an automatic objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. Here, we analyse the performance of head pose and movement features extracted from face videos using a 3D face model projected on a 2D Active Appearance Model (AAM). In a binary classification task (depressed vs. non-depressed), we modelled low-level and statistical functional features for an SVM classifier using real-world clinically validated data. Although the head pose and movement would be used as a complementary cue in detecting depression in practice, their recognition rate was impressive on its own, giving 71.2% on average, which illustrates that head pose and movement hold effective cues in diagnosing depression. When expressing positive and negative emotions, recognising depression using positive emotions was more accurate than using negative emotions. We conclude that positive emotions are expressed less in depressed subjects at all times, and that negative emotions have less discriminatory power than positive emotions in detecting depression. Analysing the functional features statistically illustrates several behaviour patterns for depressed subjects: (1) slower head movements, (2) less change of head position, (3) longer duration of looking to the right, (4) longer duration of looking down, which may indicate fatigue and eye contact avoidance. We conclude that head movements are significantly different between depressed patients and healthy subjects, and could be used as a complementary cue.

Download

Design of an Emotion Elicitation Framework for Arabic Speakers

Lecture Notes in Computer Science, 2014

The automatic detection of human affective states has been of great interest lately for its appli... more The automatic detection of human affective states has been of great interest lately for its applications not only in the field of Human-Computer Interaction, but also for its applications in physiological, neurobiological and sociological studies. Several standardized techniques to elicit emotions have been used, with emotion eliciting movie clips being the most popular. To date, there are only four studies that have been carried out to validate emotional movie clips using three different languages (English, French, Spanish) and cultures (French, Italian, British / American). The context of language and culture is an underexplored area in affective computing. Considering cultural and language differences between Western and Arab countries, it is possible that some of the validated clips, even when dubbed, will not achieve similar results. Given the unique and conservative cultures of the Arab countries, a standardized and validated framework for affect studies is needed in order to be comparable with current studies of different cultures and languages. In this paper, we describe a framework and its prerequisites for eliciting emotions that could be used for affect studies on an Arab population. We present some aspects of Arab culture values that might affect the selection and acceptance of emotion eliciting video clips. Methods for rating and validating Arab emotional clips are presented to derive at a list of clips that could be used in the proposed emotion elicitation framework. A pilot study was conducted to evaluate a basic version of our framework, which showed great potential to succeed in eliciting emotions.

Exploring Eye Activity as an Indication of Emotional States Using an Eye-Tracking Sensor

Studies in Computational Intelligence, 2014

The automatic detection of human emotional states has been of great interest lately for its appli... more The automatic detection of human emotional states has been of great interest lately for its applications not only in the Human-Computer Interaction field, but also for its applications in psychological studies. Using an emotion elicitation paradigm, we investigate whether eye activity holds discriminative power for detecting affective states. Our emotion elicitation paradigm includes induced emotions by watching emotional movie clips and spontaneous emotions elicited by interviewing participants about emotional events in their life. To reduce gender variability, the selected participants were 60 female native Arabic speakers (30 young adults, and 30 mature adults). In general, the automatic classification results using eye activity were reasonable, giving 66% correct recognition rate on average. Statistical measures show statistically significant differences in eye activity patterns between positive and negative emotions. We conclude that eye activity, including eye movement, pupil dilation and pupil invisibility could be used as a complementary cues for the automatic recognition of human emotional states.

Download

Multimodal assistive technologies for depression diagnosis and monitoring

by Gordon Parker, Sharifa Alghowinem, Roland Goecke, and Jyoti Joshi

Journal on Multimodal User Interfaces, 2013

Depression is a severe mental health disorder with high societal costs. Current clinical practice... more Depression is a severe mental health disorder with high societal costs. Current clinical practice depends almost exclusively on self-report and clinical opinion, risking a range of subjective biases. The long-term goal of our research is to develop assistive technologies to support clinicians and sufferers in the diagnosis and monitoring of treatment progress in a timely and easily accessible format. In the first phase, we aim to develop a diagnostic aid using affective sensing approaches. This paper describes the progress to date and proposes a novel multimodal framework comprising of audio-video fusion for depression diagnosis. We exploit the proposition that the auditory and visual human communication complement each other, which is well-known in auditory-visual speech processing; we investigate this hypothesis for depression analysis. For the video data analysis, intra-facial muscle movements and the movements of the head and shoulders are analysed by computing spatio-temporal interest points. In addition, various audio features (fundamental frequency f0, loudness, intensity and mel-frequency cepstral coefficients) are computed. Next, a bag of visual features and a bag of audio features are generated separately. In this study, we compare fusion methods at feature level, score level and decision level. Experiments are performed on an age and gender matched clinical dataset of 30 patients and 30 healthy controls. The results from the multimodal experiments show the proposed framework's effectiveness in depression analysis.

Download

Eye movement analysis for depression detection

by Gordon Parker, Sharifa Alghowinem, and Roland Goecke

2013 IEEE International Conference on Image Processing, 2013

Depression is a common and disabling mental health disorder, which impacts not only on the suffer... more Depression is a common and disabling mental health disorder, which impacts not only on the sufferer but also on their families, friends and the economy overall. Despite its high prevalence, current diagnosis relies almost exclusively on patient self-report and clinical opinion, leading to a number of subjective biases. Our aim is to develop an objective affective sensing system that supports clinicians in their diagnosis and monitoring of clinical depression. In this paper, we analyse the performance of eye movement features extracted from face videos using Active Appearance Models for a binary classification task (depressed vs. non-depressed). We find that eye movement low-level features gave 70% accuracy using a hybrid classifier of Gaussian Mixture Models and Support Vector Machines, and 75% accuracy when using statistical measures with SVM classifiers over the entire interview. We also investigate differences while expressing positive and negative emotions, as well as the classification performance in gender-dependent versus gender-independent modes. Interestingly, even though the blinking rate was not significantly different between depressed and healthy controls, we find that the average distance between the eyelids ('eye opening') was significantly smaller and the average duration of blinks significantly longer in depressed subjects, which might be an indication of fatigue or eye contact avoidance.

Download

Uploads

Papers by Sharifa Alghowinem

Log In