Academia.eduAcademia.edu

Outline

Communicating with a virtual human or a skin-based robot head

2008, ACM SIGGRAPH ASIA 2008 courses on - SIGGRAPH Asia '08

https://doi.org/10.1145/1508044.1508099

Abstract

Multimodal dialogue management capabilities involve and combine input and output from various interaction modalities and technologies. In this paper, we present our research done in the framework of the European Project Indigo. In this project, we try to communicate in a natural way with a virtual human and possibly with a very realistic robot skin-based face head. We have defined memory models, recognition of emotions, and dialogue interaction based on recognition of emotions of the user. We can have a dialogue either with a virtual human or with our robot head that can recognize us, ask some specific questions concerning our habits and is able to understand our answers and behave accordingly. 1. Introduction Multimodal human computer interaction involves several fields of research such as graphics, computer vision, psychology, speech recognition, natural language and others. The idea is that by combining input and output from various interaction modalities and technologies, it would make it possible for a human to interact with the computer in the same way she/he interacts with another human. By multimodality it is meant the mode of communications according to human senses or types of computer input [23]. Human senses can be categorized into sight, touch, hearing, smell and taste. However, in multimodal human computer interaction the focus is often put on only sight, hearing and sometimes on touch, because they are more feasible in terms of implementation by applying computer vision, graphics, speech recognition, speech syntheses and haptic devices. In terms of sight and hearing, humans interact using verbal (through speech) and non-verbal (through body language involving facial expressions, gaze, postures and hand motion) communication to express emotion, mood, attitude and attention. Over 90% of non-verbal communication occurs during speech [35] and [23]. Behavioural signals expressed by humans could serve as emblems, illustrators, regulator, affect displays or adaptors in a conversation [20]. Multimodal human computer interaction attempts to combine different speech and computer vision technologies in order to interpret the user's input and to express it. In the following section, this technologies and their roll in interaction is discussed.

References (53)

  1. R. D. S. G. H. J. M. A. and M. Z. Basic: a believable, adaptable socially intelligent character for social presence. In PRESENCE 2005, The 8th Annual International Workshop on Presence, 2005.
  2. N. M.-T. A. Egges, T. Molet. Personalised real-time idle motion synthesis. In Pacific Graphics 2004, Seoul, Korea, pages 121-130, 2004.
  3. N. B.-B. A. Kleinsmith, T. Fushimi. An incremental and interactive affective posture recognition system. In International Workshop on Adapting the Interaction Style to Affective Factors, 2005.
  4. A. C. A. Ortony, G. L. Clore. The Cognitive Structure of Emotions. Cambridge University Press, 1988.
  5. G. Arbantes and F. Pereira. Mpeg-4 facial animation technology: Survey, implementation, and results. IEEE Transactions on Circuits and Systems for Video Technology, 9(2):290-305, 1999.
  6. M. Argyle. Bodily communication. Methuen and Co Ltd, 1998.
  7. S. P. L. J. B. Badler and N. I. Badler. Eyes alive. ACM Transactions on Graphics, 21(3):637-644, 2002.
  8. T. Bickmore and J. Cassell. Social dialogue with embodied conversational agents. In In J. van Kuppevelt, L. Dybkjaer, and N. Bernsen (eds.), Natural, Intelligent and Effective Interaction with Multimodal Dialogue Systems. New York: Kluwer Academic, 2005.
  9. C. J. Bickmore T. W. Relational agents: A model and implementation of building user trust. In In Proceedings of SIGCHI (2001), pages 396-403, 2001.
  10. J. Cassell, Y. Nakano, T. Bickmore, C. Sidner, and C. Rich. Non-verbal cues for discourse structure. Proceedings of the 39th Annual Meeting on Association for Computational Linguistics, pages 114-123, 2001.
  11. J. Cassell, O. Torres, and S. Prevost. Turn Taking vs. Discourse Structure: How Best to Model Multimodal Conversation. Machine Conversations, pages 143-154, 1999.
  12. J. Cassell and H. Vilhjálmsson. Fully Embodied Conversational Avatars: Making Communicative Behaviors Autonomous. Autonomous Agents and Multi-Agent Systems, 2(1):45-64, 1999.
  13. J. Cassell, H. Vilhjálmsson, and T. Bickmore. BEAT: the Behavior Expression Animation Toolkit. Proceedings of the 28th annual conference on Computer graphics and interactive techniques, pages 477-486, 2001.
  14. N. Chovil. Social determinants of facial displays. Journal of Nonverbal Behavior, 15(3):141-154, 1991.
  15. H. Clark and E. Schaefer. Contributing to discourse. Cognitive Science, 13(2):259-294, 1989.
  16. F. B. Claude C. Chibelushi. Facial expression recognition: A brief tutorial overview. School of Computing, Staffordshire University, 2002.
  17. S. Duncan Jr. On the Structure of Speaker-Auditor Interaction During Speaking Turns. Language in Society, 3(2):161-180, 1974.
  18. A. Egges, X. Zhang, S. Kshirsagar, and N. Magnenat- Thalmann. Emotional communication with virtual humans. The 9th International Conference on Multimedia Modelling, 454, 2003.
  19. P. Ekman. Emotion in the Human Face. Cambridge University Press, New York, 1982.
  20. P. Ekman and W. Friesen. The repertoire of nonverbal behavior. Semiotica, 1(49-98), 1969.
  21. E. Friesen. Facial Action Coding System: A Technique for the Measurement of Facial Movement. Consulting Psychologists Press, 1978.
  22. P. Gebhard. Alma -a layered model of affect. In Proceedings of the Fourth International Joint Conference on Autonomous Agents and Multi Agent Systems, pages 29- 36, 2005.
  23. A. Jaimes and N. Sebe. Multimodal human-computer interaction: A survey. Computer Vision and Image Understanding, 108(1-2):116-134, 2007.
  24. L. Joseph. A survey of hand posture and gesture recognition techniques and technology. Technical Report CS-99-11, Brown University, Department of Computer Science, 1999.
  25. S. Khirsagar. Facial Communication. MIRALab, University of Geneva, 2003.
  26. P. T. S. Kopp and J. Cassell. Content in context: Generating language and iconic gesture without a gestionary. In Proceedings of the Workshop on Balanced Perception and Action in ECAs at AAMAS '04, 2004.
  27. S. Kopp and I. Wachsmuth. A knowledge-based approach for lifelike gesture animation. In In W. Horn, editor, ECAI 2000, 2000.
  28. A. E. S. Kshirsagar and N. Magnenat-Thalmann. Generic personality and emotion simulation for conversational agents. Computer Animation and Virtual Worlds, 15(1):1- 13, 2004.
  29. R. Laban and F. C. Lawrence. Embodied Conversational Agents. Plays, Inc., Boston, 1974.
  30. M. S. D. D. I. O. C. R. A. S. A. Lee and C. Bregler. Speaking with hands: Creating animated conversational characters from recordings of human performance. ACM Transactions on Graphics, 23(3):506-513, 2004.
  31. F. A. O. T. M. N. S. M. and H. N. Messages embedded in gaze of interface agents -impression management with agent's gaze. In Proceedings of the SIGCHI conference on Human factors in computing systems: Changing our world, changing ourselves, 2002.
  32. L. R. M. Pantic. Automatic analysis of facial expressions: the state of the art. IEEE Transactions Pattern Analysis Machine Intelligence, 22(12):1424-1445, 2000.
  33. R. R. Mccrae and P. John. An introduction to the five-factor model and its applications. Journal of Personality, 60:175- 215, 1992.
  34. D. McNeill. Hand and mind: What gestures reveal about thought. University of Chicago Press, 1992.
  35. D. McNeill. Hand and Mind: What Gestures Reveal about Thought. University Of Chicago Press, 1996.
  36. A. Mehrabian. Basic dimensions for a general psychological theory. Cambridge: OGH Publishers, 1980.
  37. A. Mehrabian. Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament. Current Psychology, 14:261- 292, 1996.
  38. S. Mitra and T. Acharya. Gesture recognition: A survey. In IEEE Transactions on System, Man and Cybernetics, Part C: Applications and Reviews, Vol. 37, No. 3, 2007.
  39. K. Nagao and A. Takeuchi. Speech dialogue with facial displays: multimodal human-computer conversation. Proceedings of the 32nd conference on Association for Computational Linguistics, pages 102-109, 1994.
  40. Y. Nakano, G. Reinstein, T. Stocky, and J. Cassell. Towards a model of face-to-face grounding. Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 553-561, 2003.
  41. N. B.-B. P. Ravindra De Silva. Modeling human affective postures: an information theoretic characterization of posture features. Journal of Visualization and Computer Animation, 15(3-4):269-276, 2004.
  42. F. Parke. Computer generated animation faces. In ACM Annual Conference, 1972.
  43. R. W. Picard. Affective computing. MIT Press, 1997.
  44. R. Plutchik. A general psychoevolutionary theory of emotion. R. Plutchik and H. Kellerman (Eds.), Emotion: Theory, research, and experience, 1:2-33, 1980.
  45. B. C. K. S. and W. I. Simulating the emotion dynamics of a multimodal conversational agent. In Proceedings of Affective Dialogue Systems: Tutorial and research workshop (ADS 2004): LNCS 3068, pages 154-165, 2004.
  46. M. Slater and S. A. A virtual presence counter. Presence, 9(5):413-434, 2000.
  47. W. A. Su W., Pham B. High-level control posture of story characters based on personality and emotion. In Pisan, Yusuf, Eds. Proceedings IE05', The Second Australasian Interactive entertainment conference, pages 179-186, 2005.
  48. J. Veldhuis. Expressing personality through head node and eye gaze. In 5th Twente Student Conference on IT, 2006.
  49. J. C. H. Vilhjálmsson and T. Bickmore. BEAT: the Behavior Expression Animation Toolkit. In Proceedings of SIGGRAPH, 2001.
  50. V. Vinayagamoorthy, M. Gillies, A. Steed, E. Tanguy, X. Pan, C. Loscos, and M. Slater. Building Expression into Virtual Characters. Eurographics Conference State of the Art Report, Vienna, 2006.
  51. C. M. Whissel. The dictionary of affect in language. R. Plutchik and H. Kellerman (Eds.) Emotion: Theory, research, and experience, 4, The measurement of emotions, 1980.
  52. I. Wilson. The artificial emotion engine, driving emotional behaviour. Artificial Intelligence and Interactive Entertainment, 2000.
  53. D. M. C. M. C. L. Zhao and N. Badler. The EMOTE model for effort and shape. In Siggraph 2000, Computer Graphics Proceedings, 2000.