Papers by Tadas Baltrusaitis

2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII), 2017
A person's face discloses important information about their affective state. Although there has b... more A person's face discloses important information about their affective state. Although there has been extensive research on recognition of facial expressions, the performance of existing approaches is challenged by facial occlusions. Facial occlusions are often treated as noise and discarded in recognition of affective states. However, hand over face occlusions can provide additional information for recognition of some affective states such as curiosity, frustration and boredom. One of the reasons that this problem has not gained attention is the lack of naturalistic occluded faces that contain hand over face occlusions as well as other types of occlusions. Traditional approaches for obtaining affective data are time demanding and expensive, which limits researchers in affective computing to work on small datasets. This limitation affects the generalizability of models and deprives researchers from taking advantage of recent advances in deep learning that have shown great success in many fields but require large volumes of data. In this paper, we first introduce a novel framework for synthesizing naturalistic facial occlusions from an initial dataset of non-occluded faces and separate images of hands, reducing the costly process of data collection and annotation. We then propose a model for facial occlusion type recognition to differentiate between hand over face occlusions and other types of occlusions such as scarves, hair, glasses and objects. Finally, we present a model to localize hand over face occlusions and identify the occluded regions of the face.
OpenFace: An open source facial behavior analysis toolkit
2016 IEEE Winter Conference on Applications of Computer Vision (WACV), 2016

The Future Belongs to the Curious: Towards Automatic Understanding and Recognition of Curiosity in Children
Workshop on Child Computer Interaction, 2016
Curiosity plays a crucial role in learning and education of children. Given its complex nature, i... more Curiosity plays a crucial role in learning and education of children. Given its complex nature, it is extremely challenging to automatically understand and recognize it. In this paper, we discuss the contexts under which curiosity can be elicited and provide an associated taxonomy. We present an initial empirical study of curiosity that includes the analysis of co-occurring emotions and the valence associated with it, together with gender-specific analysis. We also discuss the visual, acoustic and verbal behavior indicators of curiosity. Our discussions and analysis uncover some of the underlying complexities of curiosity and its temporal evolution, which is a step towards its automatic understanding and recognition. Finally, considering the central role of curiosity in education, we present two education-centered application areas that could greatly benefit from its automatic recognition.

The Cambridge Face Tracker: Accurate, Low Cost Measurement of Head Posture Using Computer Vision and Face Recognition Software
Translational vision science & technology, 2016
We validate a video-based method of head posture measurement. The Cambridge Face Tracker uses neu... more We validate a video-based method of head posture measurement. The Cambridge Face Tracker uses neural networks (constrained local neural fields) to recognize facial features in video. The relative position of these facial features is used to calculate head posture. First, we assess the accuracy of this approach against videos in three research databases where each frame is tagged with a precisely measured head posture. Second, we compare our method to a commercially available mechanical device, the Cervical Range of Motion device: four subjects each adopted 43 distinct head postures that were measured using both methods. The Cambridge Face Tracker achieved confident facial recognition in 92% of the approximately 38,000 frames of video from the three databases. The respective mean error in absolute head posture was 3.34°, 3.86°, and 2.81°, with a median error of 1.97°, 2.16°, and 1.96°. The accuracy decreased with more extreme head posture. Comparing The Cambridge Face Tracker to the ...

Images of the eye are key in several computer vision problems, such as shape registration and gaz... more Images of the eye are key in several computer vision problems, such as shape registration and gaze estimation. Recent large-scale supervised methods for these problems require time-consuming data collection and manual annotation, which can be unreliable. We propose synthesizing perfectly labelled photo-realistic training data in a fraction of the time. We used computer graphics techniques to build a collection of dynamic eye-region models from head scan geometry. These were randomly posed to synthesize close-up eye images for a wide range of head poses, gaze directions, and illumination conditions. We used our model's controllability to verify the importance of realistic illumination and shape variations in eye-region training data. Finally, we demonstrate the benefits of our synthesized training data (SynthesEyes) by out-performing state-of-the-art methods for eye-shape registration as well as cross-dataset appearance-based gaze estimation in the wild.
Decoupling facial expressions and head motions in complex emotions
2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015
Empirical analysis of continuous affect
2015 International Conference on Affective Computing and Intelligent Interaction (ACII), 2015
Cross-dataset learning and person-specific normalisation for automatic Action Unit detection
2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015
A Facial Affect Mapping Engine (FAME)

An increasing number of computer vision and pattern recognition problems require structured regre... more An increasing number of computer vision and pattern recognition problems require structured regression techniques. Problems like human pose estimation, unsegmented action recognition, emotion prediction and facial landmark detection have temporal or spatial output dependencies that regular regression techniques do not capture. In this paper we present continuous conditional neural fields (CCNF) -a novel structured regression model that can learn non-linear input-output dependencies, and model temporal and spatial output relationships of varying length sequences. We propose two instances of our CCNF framework: Chain-CCNF for time series modelling, and Grid-CCNF for spatial relationship modelling. We evaluate our model on five public datasets spanning three different regression problems: facial landmark detection in the wild, emotion prediction in music and facial action unit recognition. Our CCNF model demonstrates state-of-the-art performance on all of the datasets used.
Automatic Detection of Naturalistic Hand-over-Face Gesture Descriptors
Synthesizing expressions using facial feature point tracking: How emotion is conveyed
Many approaches to the analysis and synthesis of facial expressions rely on automatically trackin... more Many approaches to the analysis and synthesis of facial expressions rely on automatically tracking landmark points on human faces. However, this approach is usually chosen because of ease of tracking rather than its ability to convey affect. We have conducted an experiment that evaluated the perceptual importance of 22 such automatically tracked feature points in a mental state recognition task.
Whether or not emotion in music can change over time is not a question that requires discussion. ... more Whether or not emotion in music can change over time is not a question that requires discussion. As the interest in continuous emotion prediction grows, there is a greater need for tools that are suitable for dimensional emotion tracking. In this paper, we propose a novel Continuous Conditional Neural Fields model that is designed specifically for such a problem. We compare our approach with a similar Continuous Conditional Random Fields model and Support Vector Regression showing a great improvement over the baseline. Our new model is especially well suited for hierarchical models such as model-level feature fusion, which we explore in this paper. We also investigate how well it performs with relative feature representation in addition to the standard representation.
Automatic Facial Expression Analysis
Proceedings of the companion publication of the 19th international conference on Intelligent User Interfaces - IUI Companion '14, 2014
a) Puppeteer (b) Avatar (c) Final image Figure 1: (a) The puppeteer controlling the animation sho... more a) Puppeteer (b) Avatar (c) Final image Figure 1: (a) The puppeteer controlling the animation showing the automatic facial landmark and head pose tracking. (b) The avatar image being animated. (c) The final result of avatar being overlaid on the puppeteers image. ABSTRACT Facial expressions play a crucial role in human interaction.

During face-to-face communication, people continuously exchange para-linguistic information such ... more During face-to-face communication, people continuously exchange para-linguistic information such as their emotional state through facial expressions, posture shifts, gaze patterns and prosody. These affective signals are subtle and complex. In this paper, we propose to explicitly model the interaction between the high level perceptual features using Latent-Dynamic Conditional Random Fields. This approach has the advantage of explicitly learning the sub-structure of the affective signals as well as the extrinsic dynamic between emotional labels. We evaluate our approach on the Audio-Visual Emotion Challenge (AVEC 2011) dataset. By using visual features easily computable using off-theshelf sensing software (vertical and horizontal eye gaze, head tilt and smile intensity), we show that our approach based on LDCRF model outperforms previously published baselines for all four affective dimensions. By integrating audio features, our approach also outperforms the audio-visual baseline.
how emotion is conveyed
Many approaches to the analysis and synthesis of facial expressions rely on automatically trackin... more Many approaches to the analysis and synthesis of facial expressions rely on automatically tracking landmark points on human faces. However, this approach is usually chosen because of ease of tracking rather than its ability to convey affect. We have conducted an experiment that evaluated the perceptual importance of 22 such automatically tracked feature points in a mental state recognition task.
Crowdsouring in emotion studies across time and culture
3D Corpus of Spontaneous Complex Mental States
Abstract. Hand-over-face gestures, a subset of emotional body lan-guage, are overlooked by automa... more Abstract. Hand-over-face gestures, a subset of emotional body lan-guage, are overlooked by automatic affect inference systems. We propose the use of hand-over-face gestures as a novel affect cue for automatic in-ference of cognitive mental states. Moreover, affect recognition ...
Uploads
Papers by Tadas Baltrusaitis