Modelling Users’ Affect in Job Interviews: Technological Demo
2013, Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-642-38844-6_37…
3 pages
1 file
Sign up for access to the world's latest research
Abstract
This demo presents an approach to recognising and interpreting social cues-based interactions in computer-enhanced job interview simulations. We show what social cues and complex mental states of the user are relevant in this interaction context, how they can be interpreted using static Bayesian Networks, and how they can be recognised automatically using state-of-the-art sensor technology in real-time.
Related papers
2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), 2015
Ever wondered why you have been rejected from a job despite being a qualified candidate? What went wrong? In this paper, we provide a computational framework to quantify human behavior in the context of job interviews. We build a model by analyzing 138 recorded interview videos (total duration of 10.5 hours) of 69 internship-seeking students from Massachusetts Institute of Technology (MIT) as they spoke with professional career counselors. Our automated analysis includes facial expressions (e.g., smiles, head gestures), language (e.g., word counts, topic modeling), and prosodic information (e.g., pitch, intonation, pauses) of the interviewees. We derive the ground truth labels by averaging over the ratings of 9 independent judges. Our framework automatically predicts the ratings for interview traits such as excitement, friendliness, and engagement with correlation coefficients of 0.73 or higher, and quantifies the relative importance of prosody, language, and facial expressions. According to our framework, it is recommended to speak more fluently, use less filler words, speak as "we" (vs. "I"), use more unique words, and smile more.
2006
Any interactive software program must interpret the users' actions and come up with an appropriate response that is intelligable and meaningful to the user. In most situations, the options of the user are determined by the software and hardware and the actions that can be carried out are unambiguous. The machine knows what it should do when the user carries out an action. In most cases, the user knows what he has to do by relying on conventions which he may have learned by having had a look at the instruction manual, having them seen performed by somebody else, or which he learned by modifying a previously learned convention. Some, or most, of the times he just finds out by trial and error. In user-friendly interfaces, the user knows, without having to read extensive manuals, what is expected from him and how he can get the machine to do what he wants. An intelligent interface is so-called, because it does not assume the same kind of programming of the user by the machine, but the machine itself can figure out what the user wants and how he wants it without the user having to take all the trouble of telling it to the machine in the way the machine dictates but being able to do it in his own words. Or perhaps by not using any words at all, as the machine is able to read off the intentions of the user by observing his actions and expressions. Ideally, the machine should be able to determine what the user wants, what he expects, what he hopes will happen, and how he feels. The Human Media Interaction group is carrying out several studies that involve determining in one way or another what the mental state is of a person by analyzing various aspects of his behaviour. In developing intelligent tutoring systems, for instance, we like to know as much as possible of the mental state of the students as is useful to respond appropriately to their actions. A tutoring system should create the right conditions for the students to learn something. This means adjusting the teaching strategy or the specific instructions and feedback to the student establishing the right frame of mind for optimal learning. This not only involves determining whether the students are involved and attending, but also whether or not they are motivated and how they are responding to the mistakes they are making. Do students feel the right kind of frustration? Are they bored or challenged to do better? Are they understanding the instructions? Are things too easy for them or too difficult etcetera? These are some of the things that a skilled human instructor can deduce without effort from the students' behaviours. The question is: how can we build a machine with similar skills? Also for other applications, it is extremely valuable if one could find out more about the mental state of a person. Besides intelligent tutoring systems, we are also interested in automatically interpreting the behaviours of people engaged in human to human conversation. In the research carried out in the context of the AMI project (http://www.amiproject.org/), we are looking at people involved in meetings. Observing people and interpreting their actions to find out their intentions and motivations, what people feel and think, is needed to develop intelligently searchable multimedia recordings of the meetings. Speech recognizers can help us to produce transcripts of the meeting automatically and natural language processing, information extraction and information retrieval techniques can help us to enrich the data with tags that can be used as metadata enabling semantic access. But what we need is not just access to what was said. We also have to find out how it was intended. Only to a limited extent do the actual words that were used express what was meant. Also, we might not just be interested in what the speaker had to say, but also what the listeners thought and felt about it. Was a proposal greeted with great enthusiasm or with a fair amount of scepticism? Who agreed immediately and who didn't? For both the tutoring and the meeting case, we can deduce a lot about the intentions of students and participants in the meeting by analysing the actions they performed in the task they were given or the things they said in the meeting. However, much of the information about the mental state is not presented in these actions or the words that are spoken, but rather in the way the actions are carried out and the other behaviours that accompany them. In particular, the paraverbal and nonverbal signals may give us an indication of extra dimensions of the mental state of the users of a system or the participants interacting in a meeting or other form of conversation. Recognizing and interpreting such cues is an important research area. In order to understand the way the cues work in the actual cases that we are considering, we need to look at naturalistic data to find out what behaviours are being displayed and how they are associated with diverse functions. In the paragraphs that follow we will discuss several aspects of this undertaking, focussing on the collection and analsysis of data. Both for the tutoring case
Lecture Notes in Computer Science, 2014
We define job interviews as a domain of interaction that can be modelled automatically in a serious game for job interview skills training. We present four types of studies: (1) field-based human-to-human job interviews, (2) field-based computer-mediated human-to-human interviews, (3) lab-based wizard of oz studies, (4) field-based human-toagent studies. Together, these highlight pertinent questions for the user modelling field as it expands its scope to applications for social inclusion. The results of the studies show that the interviewees suppress their emotional behaviours and although our system recognises automatically a subset of those behaviours, the modelling of complex mental states in real-world contexts poses a challenge for the state-of-the-art user modelling technologies. This calls for the need to reexamine both the approach to the implementation of the models and/or of their usage for the target contexts.
2011
Emotions play a significant role in many human mental activities, including decisionmaking, motivation, and cognition. Various intelligent and expert systems can be empowered with emotionally intelligent capabilities, especially systems that interact with humans and mimic human behaviour. However, most current methods in affect recognition studies use intrusive, lab-based, and expensive tools which are unsuitable for real-world situations. Inspired by studies on keystrokes dynamics, this thesis investigates the effectiveness of diagnosing users’ affect through their typing behaviour in an educational context. To collect users’ typing patterns, a field study was conducted in which subjects used a dialogue-based tutoring system built by the researcher. Eighteen dialogue features associated with subjective and objective ratings for users’ emotions were collected. Several classification techniques were assessed in diagnosing users’ affect, including discrimination analysis, Bayesian ana...
Job interviews come with a number of challenges, especially for young people who are out of employment, education, or training (NEETs). This paper presents an approach to a job training simulation environment that employs two virtual characters and social cue recognition techniques to create an immersive interactive job interview. The two virtual characters are created with different social behavior profiles, understanding and demanding, which consequently influences the level of difficulty of the simulation as well as the impact on the user. Finally, we present a user study which investigates the feasibility of the proposed approach by measuring the effect the different virtual characters have on the users.
Lecture Notes in Computer Science, 2013
The TARDIS project aims to build a scenario-based serious-game simulation platform for NEETs and job-inclusion associations that supports social training and coaching in the context of job interviews. This paper presents the general architecture of the TARDIS job interview simulator, and the serious game paradigm that we are developing. 1 NEET is a government acronym for young people not in employment, education or training. 2 ec.europa.eu/eurostat 3
Multimodal user interfaces, 2008
Affective and human-centered computing have attracted a lot of attention during the past years, mainly due to the abundance of environments and applications able to exploit and adapt to multimodal input from the users. The combination of facial expressions with prosody information allows us to capture the users' emotional state in an unintrusive manner, relying on the best performing modality in cases where one modality suffers from noise or bad sensing conditions. In this paper, we describe a multi-cue, dynamic approach to detect ...
HCI in Business, Government, and Organizations: Information Systems, 2016
The well-executed recruitment and retention of employees in organisations in a highly competitive global market has grown significantly in the last decade. The need for managers to be emotionally intelligent for better management and productivity to deal with employees from generation Y and Z is also in great demand. In this paper we presents a framework which embodies human computer interaction techniques like facial emotion recognition, speech recognition and synthesis in socially assistive robot with human-like communication modalities to capture, analyse, profile and benchmark verbal and non-verbal data during a real-time job interview for hiring salespersons. This research fundamentally changes how employers can leverage the data analysis to seek for the best job applicant and how they perceive the use of human computer interaction (HCI) techniques and information technology in human resource management practice. Existing approaches for recruitment primarily rely on selection criteria and/or psychometric techniques followed by face to face interviews by subjective judgements of human beings. For example, the high turnover of salespersons in the industry has shown limited success of these procedures. Additionally, existing approaches lack benchmarking analysis internally by comparing the profile of most cultural fit employees. Thus, this research incorporates behavioural psychology, data mining, image processing, HCI modelling and techniques to provide a more holistic recruitment application using emotionally aware social robot. The implications of this research not only apply into the hiring and benchmarking of employees, but also collecting big data (verbal and non-verbal) for decision-making, personalised profiling and training.
Transactions of the Japanese Society for Artificial Intelligence, 2018
To completely mimic the naturalness of human interaction in Human-Computer Interaction (HCI), emotion is an essential aspect that should not be overlooked. Emotion allows for a rich and meaningful human interaction. In communicating, not only we express our emotional state, but we are also affected by our conversational counterpart. However, existing works have largely focused only on occurrences of emotion through recognition and simulation. The relationship between an utterance of a speaker and the resulting emotional response that it triggers is not yet closely examined. Observation and incorporation of the underlying process that causes change of emotion can provide useful information for dialogue systems in making a more emotionally intelligent decision, such as being able to take proper action with regard to user's emotion, and to be aware of the emotional implication of their response. To bridge this gap, in this paper, we tackle three main tasks: 1) recognition of emotional states, 2) analysis of social-affective events in spontaneous conversational data, to capture the relationship between actions taken in discourse and the emotional response that follows, and 3) prediction of emotional triggers and responses in a conversational context. The proposed study differs from existing works in that it focuses on the change of emotion (emotional response) and its cause (emotional triggers) on top of the occurrence of emotion itself. The analysis and experimental results are reported in detail in this paper, showing promising initial results for future works and development.
HAL (Le Centre pour la Communication Scientifique Directe), 2015
This paper presents an architecture for an adaptive virtual recruiter in the context of job interview simulation. This architecture allows the virtual agent to adapt its behaviour according to social constructs (e.g. attitude, relationship) that are updated depending on the behaviour of their interlocutor. During the whole interaction, the system analyses the behaviour of the human participant, builds and updates mental states of the virtual agent and adapts its social attitude expression. This adaptation mechanism can be applied to a wide spectrum of application domains in Digital Inclusion, where the user need to train social skills with a virtual peer.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (9)
- Curhan, J., Pentland, A.: Thin slices of negotiation: predicting outcomes from con- versational dynamics within the first 5 minutes (2007)
- Arvey, R.D., Campion, J.E.: The employment interview: A summary and review of recent research. Personnel Psychology 35(2), 281-322 (1982)
- Bernardini, S., Porayska-Pomsta, K., Smith, T.J., Avramides, K.: Building au- tonomous social partners for autistic children. In: Nakano, Y., Neff, M., Paiva, A., Walker, M. (eds.) IVA 2012. LNCS, vol. 7502, pp. 46-52. Springer, Heidelberg (2012)
- Vala, M., Sequeira, P., Paiva, A., Aylett, R.: Fearnot! demo: a virtual environment with synthetic characters to help bullying. In: Proc. 6th Intl. Joint Conf. on Au- tonomous Agents and Multiagent Systems, AAMAS 2007, pp. 271:1-271:2. ACM, New York (2007)
- Vogt, T., Andre, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. In: IEEE Intl. Conf. on Multimedia and Expo, ICME 2005, pp. 474-477 (July 2005)
- Zeng, Z., Pantic, M., Roisman, G.I., Huang, T.S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(1), 39-58 (2009)
- Kapoor, A., Picard, R.W.: Multimodal affect recognition in learning environments. In: Proc. 13th Annual ACM Intl. Conf. on Multimedia, MULTIMEDIA 2005, pp. 677-682. ACM, New York (2005)
- Kleinsmith, A., Bianchi-Berthouze, N.: Form as a cue in the automatic recognition of non-acted affective body expressions. In: D'Mello, S., Graesser, A., Schuller, B., Martin, J.-C. (eds.) ACII 2011, Part I. LNCS, vol. 6974, pp. 155-164. Springer, Heidelberg (2011)
- Wagner, J., Lingenfelser, F., André, E.: The social signal interpretation framework (SSI) for real time signal processing and recognition. In: Proc. Interspeech 2011 (2011)