Multimodal Human Computer Interaction: A Survey

ALEJANDRO JAIMES

doi:10.1007/11573425_1

Outline

Multimodal Human Computer Interaction: A Survey

ALEJANDRO JAIMES

2005, International Conference on Computer Vision

https://doi.org/10.1007/11573425_1

visibility

…

description

31 pages

link

1 file

Abstract

In this paper we review the major approaches to multimodal human computer interaction from a computer vision perspective. In particular, we focus on body, gesture, gaze, and affective interaction (facial expression recognition, and emotion in audio). We discuss user and task modeling, and multimodal fusion, highlighting challenges, open issues, and emerging applications for Multimodal Human Computer Interaction (MMHCI) research.

References (196)

A. Adjoudani and C. Benoit, "On the integration of auditory and visual parameters in an HMM-based ASR," Speech Reading by Humans and Machines, D. Stork and M. Hennecke, eds., Springer, 1996.
A. Adler, J. Eisenstein, M. Oltmans, L. Guttentag, and R. Davis, "Building the design studio of the future," AAAI Fall Symposium on Making Pen-Based Interaction Intelligent and Natural, 2004.
J.K. Aggarwal and Q. Cai, "Human motion analysis: A review," CVIU, 73(3):428-440, 1999.
N. Ambady and R. Rosenthal, "Thin slices of expressive behavior as predictors of interpersonal con- sequences: A meta-analysis," Psychological Bulletin, 111(2):256-274, 1992.
E. Arts, "Ambient intelligence: A multimedia perspective," IEEE Multimedia, 11(1):12-19, 2004.
M. Balabanovic, "Exploring versus exploiting when learning user models for text recommendations," User Modeling and User-adapted Interaction, 8:71-102, 1998.
T. Balomenos, A. Raouzaiou, S. Ioannou, S. Drosopoulos, A. Karpouzis, and S. Kollias, "Emotion analysis in man-machine interaction systems," Workshop on Machine Learning for Multimodal Inter- action, 2005.
J. Ben-Arie, Z. Wang, P. Pandit, and S. Rajaram, "Human activity recognition using multidimensional indexing," IEEE Trans. on PAMI, 24(8):1091-1104, 2002.
M. Benali-Khoudja, M. Hafez, J.-M. Alexandre, and A. Kheddar, "Tactile interfaces: A state-of-the- art survey," Int. Symposium on Robotics, 2004.
N.O. Bernsen, "A reference model for output information in intelligent multimedia presentation sys- tems," European Conf. on Artificial Intelligence, 1996.
N.O. Bernsen, "Foundations of multimodal representations: A taxonomy of representational modali- ties," Interacting with Computers, 6(4):347-71, 1994.
N.O. Bernsen, "Defining a taxonomy of output modalities from an HCI perspective," Computer Stan- dards and Interfaces, Special Double Issue, 18(6-7):537-553, 1997.
N.O. Bernsen, "Multimodality in language and speech systems -From theory to design support tool," Multimodality in Language and Speech Systems, Granström, B., ed., Kluwer Academic Publishers 2001.
N. Bianchi-Berthouze and C. Lisetti, "Modeling multimodal expression of user's affective subjective experience," User Modeling and User-adapted Interaction, 12:49-84, 2002.
M. Black and Y. Yacoob, Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion, Int'l Conference on Computer Vision, pp. 374-381, 1995.
M. Blattner and E. Glinert, "Multimodal integration," IEEE Multimedia, 3(4):14-24, 1996.
A.F. Bobick and J. Davis, "The recognition of human movement using temporal templates," IEEE Trans. on PAMI, 23(3):257-267, 2001.
R. Bolt, "Put-That-There: Voice and gesture at the graphics interface," Computer Graphics, 14(3):262-270, 1980.
C. Borst and R. Volz, "Evaluation of a haptic mixed reality system for interactions with a virtual control panel," Presence: Teleoperators and Virtual Environments, 14(6), 2005.
J.S. Bradbury, J.S. Shell, and C.B. Knowles, "Hands on cooking: Towards an attentive kitchen", ACM Conf. Human Factors in Computing Systems (CHI), 2003.
C. Bregler and Y. Konig, "Eigenlips for robust speech recognition," Int. Conf. on Acoustics, Speech, and Signal Processing, 1994.
S.A. Brewster, J. Lumsden, M. Bell, M. Hall, and S. Tasker, "Multimodal 'Eyes-Free' interaction techniques for wearable devices," ACM Conf. Human Factors in Computer Systems (CHI,) 2003.
C.S. Campbell and P.P. Maglio, "A robust algorithm for reading detection," ACM Workshop on Per- ceptive User Interfaces, 2001.
S.K. Card, T. Moran, and A. Newell, The Psychology of Human-computer Interaction, Lawrence Erlbaum Associates, 1983.
R. Carpenter, The Logic of Typed Feature Structures, Cambridge University Press, 1992.
D. Chen, R. Malkin, and J. Yang, "Multimodal detection of human interaction events in a nursing home environment," Conf. on Multimodal Interfaces (ICMI), 2004.
L.S. Chen, Joint Processing of Audio-visual Information for the Recognition of Emotional Expres- sions in Human-computer Interaction, PhD thesis, Univ. of Illinois at Urbana-Champaign, 2000.
L.S. Chen, R. Travis Rose, F. Parrill, X. Han, J. Tu, Z. Huang, M. Harper, F. Quek, D. McNeill, R. Tuttle, and T.S. Huang, "VACE multimodal meeting corpus," MLMI 2005.
D. Chen, R. Malkin, and J. Yang, "Multimodal detection of human interaction events in a nursing home environment," Conf. on Multimodal Interfaces (ICMI), 2004.
A. Cheyer and L. Julia, "MVIEWS: Multimodal tools for the video analyst," Conf. on Intelligent User Interfaces (IUI), 1998.
I. Cohen, N. Sebe, F. Cozman, M. Cirelo, and T.S. Huang, "Semi-supervised learning of classifiers: Theory, algorithms, and their applications to human-computer interaction," IEEE Trans. on PAMI, 22(12):1553-1567, 2004.
I. Cohen, N. Sebe, A. Garg, L. Chen, and T.S. Huang, "Facial expression recognition from video sequences: Temporal and static modeling," CVIU, 91(1-2):160-187, 2003.
P.R. Cohen, M. Johnston, D.R. McGee, S.L. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow, "QuickSet: Multimodal interaction for distributed applications," ACM Multimedia, pp. 31-40, 1997.
P.R. Cohen and D.R. McGee, "Tangible multimodal interfaces for safety-critical applications," Com- munications of the ACM, 47(1):41-46, 2004.
P.R. Cohen and S.L. Oviatt, "The role of voice in human-machine communication," Voice Communi- cation between Humans and Machines, D. Roe and J. Wilpon, eds., National Academy Press, 1994.
Computer Vision and Image Understanding, Special Issue on Eye Detection and Tracking, 98(1), 2005.
A. Corradini, M. Mehta, N. Bernsen, and J.-C. Martin, "Multimodal input fusion in human-computer interaction," NATO-ASI Conf. on Data Fusion for Situation Monitoring, Incident Detection, Alert, and Response Management, 2003
C. Dickie, R. Vertegaal, D. Fono, C. Sohn, D. Chen, D. Cheng, J.S. Shell, and O. Aoudeh, "Augment- ing and sharing memory with eyeBlog," in CARPE 2004.
A. Dix, J. Finlay, G. Abowd, and R. Beale, Human-computer Interaction, Prentice Hall, 2003.
Z. Duric, W. Gray, R. Heishman, F. Li, A. Rosenfeld, M. Schoelles, C. Schunn, and H. Wechsler, "Integrating perceptual and cognitive modeling for adaptive and intelligent human-computer interac- tion," Proc. of the IEEE, 90(7):1272-1289, 2002.
A.T. Duchowski, "A breadth-first survey of eye tracking applications," Behavior Research Methods, Instruments, and Computing, 34(4):455-70, 2002.
S. Dusan, G.J. Gadbois, and J. Flanagan, "Multimodal interaction on PDA's integrating speech and pen inputs," Eurospeech 2003.
P. Ekman, ed., Emotion in the Human Face, Cambridge University Press, 1982.
C. Elting, S. Rapp, G. Mohler, and M. Strube, "Architecture and implementation of multimodal plug and play," ICMI, 2003.
I. Essa and A. Pentland, "Coding, analysis, interpretation, and recognition of facial expressions," IEEE Trans. on PAMI, 19(7):757-763, 1997.
C. Fagiani, M. Betke, and J. Gips, "Evaluation of tracking methods for human-computer interaction," IEEE Workshop on Applications in Computer Vision, 2002.
B. Fasel and J. Luettin, "Automatic facial expression analysis: A survey," Pattern Recognition, 36:259-275, 2003.
G. Fitzmaurice, H. Ishii, and W.Buxton, "Bricks: Laying the foundations for graspable user inter- faces," ACM Conf. Human Factors in Computer Systems (CHI), 1(442-449), 1995.
F. Flippo, A. Krebs, and I. Marsic, "A framework for rapid development of multimodal interfaces," ICMI, 2003.
T. Fong, I. Nourbakhsh, and K. Dautenhahn, "A survey of socially interactive robots," Robotics and Autonomous Systems, 42(3-4):143-166, 2003.
G. Fritz, C. Seifert, P. Luley, L. Paletta, and A. Almer, "Mobile vision for ambient learning in urban environments in urban environments," Int. Conf. on Mobile Learning (MLEARN), 2004.
S. Fussell, L. Setlock, J. Yang, J. Ou, E. Mauer, and A. Kramer, "Gestures over video streams to support remote collaboration on physical tasks," Human-computer Interaction, 19(3):273-309, 2004.
A. Garg, M. Naphade, and T.S. Huang, "Modeling video using input/output Markov models with application to multi-modal event detection," Handbook of Video Databases: Design and Applications, 2003.
A. Garg, V. Pavlovic, and J. Rehg. "Boosted learning in dynamic Bayesian networks for multimodal speaker detection," Proceedings of the IEEE, 91(9):1355-1369, 2003.
D. Gatica-Perez, "Analyzing group interactions in conversations: A survey", IEEE Int. Conf. Mul- tisensor Fusion and Integration for Intelligent Systems, pp. 41-46, 2006.
D.M. Gavrila, "The visual analysis of human movement: A survey," CVIU, 73(1):82-98, 1999.
H.J. Go, K.C. Kwak, D.J. Lee, and M. Chun, "Emotion recognition from facial image and speech signal," Conf. on the Society of Instrument and Control Engineers, 2003.
K. Grauman, M. Betke, J. Lombardi, J. Gips, and G. Bradski, "Communication via eye blinks and eyebrow raises: Video-based human-computer interfaces," Universal Access in the Information Soci- ety, 2(4):359-373, 2003.
H. Gunes, M. Piccardi, and T. Jan, "Face and body gesture recognition for a vision-based multimodal analyzer," Pan-Sydney Area Workshop on Visual Information Processing, Vol. 36, 2004.
A. Hakeem and M. Shah, "Ontology and taxonomy collaborated framework for meeting classifica- tion," ICPR, 2004.
A. Hanjalic and L-Q. Xu, "Affective video content representation and modeling," IEEE Trans. on Multimedia, 7(1):143-154, 2005.
R. Heishman, Z. Duric, and H. Wechsler, "Using eye region biometrics to reveal affective and cogni- tive states," CVPR Workshop on Face Processing in Video, 2004.
E. Hjelmas and B. K. Low, "Face detection: A survey," CVIU, 83:236-274, 2001.
K. Hook, "Designing and evaluating intelligent user interfaces," Int. Conf. on Intelligent User Inter- faces, 1999.
W. Hu, T. Tan, L. Wang, and S. Maybank, "A survey on visual surveillance of object motion and behaviors", IEEE Trans. on Systems, Man, and Cybernetics, 34(3), 2004.
IEEE Computer, Special Issue on Human-centered Computing, A. Jaimes, N. Sebe, D. Gatica-Peres, and T.S. Huang, eds., May 2007.
Int. J. of Human-Computer Studies, Special Issue on Applications of Affective Computing in Human- computer Interaction, 59(1-2), 2003.
S. Intille, K. Larson, J. Beaudin, J. Nawyn, E. Tapia, and P. Kaushik, "A living laboratory for the design and evaluation of ubiquitous computing technologies," ACM Conf. Human Factors in Comput- ing Systems (CHI), 2004.
R. Jacob, "The use of eye movements in human-computer interactions techniques: What you look at is what you get," ACM Trans. Information Systems, 9(3):152-169, 1991.
A. Jaimes, "Human-centered multimedia: Culture, deployment, and access", IEEE Multimedia, 13(1):12 -19, 2006.
A. Jaimes and J. Liu, "Hotspot components for gesture-based interaction," IFIP Interact, 2005.
A. Jaimes, T. Nagamine, J. Liu, K. Omura, and N. Sebe, "Affective meeting video analysis," IEEE Int. Conf. on Multimedia and Expo, 2005.
A. Jaimes and N. Sebe, "Multimodal human computer interaction: A survey," IEEE Int. Workshop on Human-computer Interaction, 2005.
A. Jaimes, N. Sebe, and Daniel Gatica-Perez, "Human-centered computing: A multimedia perspec- tive," ACM Multimedia, pp 855-864, 2006.
R. Jain, "Folk computing," Comm. of the ACM, 46(4):27-29, 2003.
A. Jameson, R. Schafer, T. Weis, A, Berthold, and T. Weyrath, "Making systems sensitive to the user's time and working memory constraints," Int. Conf. on Intelligent User Interfaces, 1997.
Q. Ji and X. Yang, "Real-time eye, gaze, and face pose tracking for monitoring driver vigilance," Real-Time Imaging, 8:357-377, 2002.
M. Johnston, P. Cohen, D. McGee, S.L. Oviatt, J. Pittman, I. Smith, "Unification-based multimodal integration," Annual Meeting of the Association for Computational Linguistics, 1998.
M. Johnston and S. Bangalore, "Multimodal Applications from Mobile to Kiosk," W3C Workshop on Multimodal Interaction, 2002.
R. El Kaliouby and P. Robinson, "Real time inference of complex mental states from facial expres- sions and head gestures," CVPR Workshop on Real-time Vision for HCI, 2004.
M. Kay, "Functional unification grammar: A formalism for machine translation," Int. Conf. on Com- putational Linguistics, 1984.
S. Kettebekov and R. Sharma, "Understanding gestures in multimodal human computer interaction," Int. J. on Artificial Intelligence Tools, 9(2):205-223, 2000.
T. Kirishima, K. Sato, and K. Chihara, "Real-time gesture recognition by learning and selective con- trol of visual interest points," IEEE Trans. on PAMI, 27(3):351-364, 2005.
B. Kisacanin, V. Pavlovic, and T.S.Huang, eds., Real-Time Vision for Human-Computer Interaction, Springer-Verlag, 2005.
Y. Kono, T. Kawamura, T. Ueoka, S. Murata, and M. Kidode, "Real world objects as media for aug- menting human memory," Workshop on Multi-User and Ubiquitous User Interfaces(MU3I), pp.37- 42, 2004.
S. Kumar and P. Cohen, "Towards a fault-tolerant multi-agent system architecture," Int. Conf. on Autonomous Agents, 2000.
Y. Kuno, N. Shimada, and Y. Shirai, "Look where you're going: A robotic wheelchair based on the integration of human and environmental observations," IEEE Robotics and Automation, 10(1):26-34, 2003.
P. Lang, "The emotion probe: Studies of motivation and attention," American Psychologist, 50(5):372-385, 1995.
P. Langley, "User modeling in adaptive interfaces," Int. Conf. on User Modeling, 1998.
A. Lanitis, C.J. Taylor, and T. Cootes, "A unified approach to coding and interpreting face images," Int'l Confernece on Computer Vision, pp. 368-373, 1995.
V. Lauruska and P. Serafinavicius, "Smart home system for physically disabled persons with verbal communication difficulties," Assistive Technology Research Series (AAATE), pp.579-583, 2003.
A. Legin, A. Rudnitskaya, B. Seleznev, and Y. Vlasov, "Electronic tongue for quality assessment of ethanol, vodka and eau-de-vie," Analytica Chimica Acta, 534:129-135, 2005.
J. Lien, Automatic recognition of facial expressions using hidden Markov models and estimation of expression intensity, PhD thesis, Carnegie Mellon University, 1998.
M.J. Lyons, M. Haehnel, and N. Tetsutani, "Designing, playing, and performing, with a vision-based mouth Interface," Conf. on New Interfaces for Musical Expression, 2003.
A.M. Malkawi and R.S. Srinivasan, "Multimodal human-computer interaction for immersive visuali- zation: Integrating speech-gesture recognition and augmented reality for indoor environments," Int'l Association of Science & Technology for Development Conf. on Computer Graphics and Imaging, 2004.
S. Marcel, "Gestures for multi-modal interfaces: A Review," Technical Report IDIAP-RR 02-34, 2002.
D. Martin, A. Cheyer, and D. Moran, "The open agent architecture: A framework for building distrib- uted software systems," Applied Artificial Intelligence, 13:91-128, 1999.
A. Martinez, "Face image retrieval using HMMs," IEEE worlshop on Content-based access of images and video libraries, pp. 35-39, 1999.
K. Mase, "Recognition of facial expressions from optical flow," IEICE Trans., E74(10):3474-3483, 1991.
M. Maybury, Intelligent Multimedia Interfaces, AAAI/MIT Press, 1993.
D. Maynes-Aminzade, R. Pausch, and S. Seitz, "Techniques for interactive audience participation," ICMI 2002.
I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang, "Automatic analy- sis of multimodal group actions in meetings," IEEE Trans. on PAMI, 27(3):305-317, 2005.
D. McNeill, Hand and Mind: What Gestures Reveal About Thought, Univ. of Chicago Press, 1992.
A. Mehrabian, "Communication without words," Psychology Today, 2(4):53-56, 1968.
S. Meyer and A. Rakotonirainy, "A Survey of research on context-aware homes," Australasian In- formation Security Workshop Conference on ACSW Frontiers, 2003
M. Minsky, "A framework for representing knowledge," The Psychology of Computer Vision, P. Winston, ed., McGraw-Hill, 1975.
T. B. Moeslund and E. Granum, "A survey of computer vision-based human motion capture," CVIU, 81(3):231-258, 2001.
T. Moriyama, T. Kanade, J. Xiao, and J. Cohn, "Meticulously Detailed Eye Region Model and Its Application to Analysis of Facial Images," IEEE Trans. on PAMI, 28(5):738-752, 2006.
I.R. Murray and J.L. Arnott, "Toward the simulation of emotion in synthetic speech: A review of the literature of human vocal emotion," J. of the Acoustic Society of America, 93(2):1097-1108, 1993.
A. Nefian and M. Hayes, "Face recognition using an embedded HMM," IEEE Conf. on Audio and Video-based Biometric Person Authentication, pp. 19-24, 1999.
J. Nielsen, "Non-command user interfaces," Comm. of the ACM, 36(4):83-99, 1993.
L. Nigay and J. Coutaz, "A design space for multimodal systems: Concurrent processing and data fusion," ACM Conf. Human Factors in Computing Systems (CHI), 1993.
A. Nijholt and D. Heylen, "Multimodal communication in inhabited virtual environments," Int. J. of Speech Technology 5:343-354, 2002.
D.A. Norman, The Design of Everyday Things, Doubleday, 1988.
Z. Obrenovic and D. Starcevic, "Modeling multimodal human-computer interaction," IEEE Com- puter, pp. 65-72, September, 2004.
N. Oliver, A. Pentland, and F. Berard, "LAFTER: A real-time face and lips tracker with facial expres- sion recognition," Pettern Recognition, 33:1369-1382, 2000.
T. Otsuka and J. Ohya, "Recognizing multiple persons' facial expressions using HMM based on automatic extraction of significant frames from image sequences," IEEE Conf. on Image Processing, pp. 546-549, 1997.
T. Otsuka and J. Ohya, "A study of transformation of facial expressions based on expression recogni- tion from temporal image sequences," Technical Report, Institute of Electronic information, and Communications Engineers (IEICE), 1997.
P.Y. Oudeyer, "The production and recognition of emotions in speech: Features and algorithms," Int. J. of Human-Computer Studies, 59(1-2):157-183, 2003.
A. Oulasvirta and A. Salovaara, "A cognitive meta-analysis of design approaches to interruptions in intelligent environments," ACM Conf. Human Factors in Computing Systems (CHI), 2004.
P. Qvarfordt, and S. Zhai, "Conversing with the user based on eye-gaze patterns," ACM Conf. Human Factors in Computing Systems (CHI), 2005.
S.L. Oviatt, R. Lunsford, and R. Coulston, "Individual differences in multimodal integration patterns: What are they and why do they exist?" ACM Conf. Human Factors in Computing Systems (CHI), 2005.
S.L. Oviatt, "Mutual disambiguation of recognition errors in a multimodal architecture," ACM Conf. Human Factors in Computing Systems (CHI), 1999.
S.L. Oviatt, "Ten myths of multimodal interaction," Comm. of the ACM, 42(11):74-81, 1999.
S.L. Oviatt, T. Darrell, and M. Flickner, eds. Comm. of the ACM, Special Issue on Multimodal inter- faces that flex, adapt, and persist, 47(1), 2004.
S.L. Oviatt and P. Cohen, "Multimodal interfaces that process what comes naturally," Comm. of the ACM, 43(3):45-48, 2000.
S.L. Oviatt, "Multimodal interfaces," Human-Computer Interaction Handbook: Fundamentals, Evolv- ing Technologies and Emerging Applications, chap.14, 286-304, 2003.
S.L. Oviatt, P. Cohen, L. Wu, J. Vergo, L. Duncan, B. Suhm, J. Bers, T. Holzman, T. Winograd, J. Landay, J. Larson, and D. Ferro, "Designing the user interface for multimodal speech and pen-based gesture applications: State-of-the-art systems and future research directions," Human-computer Inter- action, 15:263-322, 2000.
P. Paggio and B. Jongejan, "Multimodal communication in the virtual farm of the staging Project," Multimodal Intelligent Information Presentation, O. Stock and M. Zancanaro, eds., pp. 27-46, Kluwer Academic Publishers, 2005.
H. Pan, Z.P. Liang, T.J. Anastasio, and T.S. Huang. "Exploiting the dependencies in information fusion," CVPR, vol. 2:407-412, 1999.
M. Pantic and L.J.M. Rothkrantz, "Automatic analysis of facial expressions: The state of the art," IEEE Trans. on PAMI, 22(12):1424-1445, 2000.
M. Pantic and L.J.M. Rothkrantz, "Toward an affect-sensitive multimodal human-computer interac- tion," Proceedings of the IEEE, 91(9):1370-1390, 2003.
M. Pantic, N. Sebe, J. Cohn, and T.S. Huang, "Affective multimodal human-computer interaction," ACM Multimedia, 2005.
J. Paradiso and F. Sparacino, "Optical tracking for music and dance performance," Optical 3-D Measurement Techniques IV, A. Gruen, H. Kahmen, eds., pp. 11-18, 1997.
Pattern Recognition Letters, Special Issue on Multimodal Biometrics, Vol. 24, No. 13, September 2003.
V.I. Pavlovic, R. Sharma and T.S. Huang, "Visual interpretation of hand gestures for human- computer interaction: A review", IEEE Trans. on PAMI, 19(7):677-695, 1997.
J.B. Pelz, "Portable eye-tracking in natural behavior," J. of Vision, 4(11), 2004.
A. Pentland, "Looking at people," Comm. of the ACM, 43(3):35-44, 2000.
A. Pentland, "Socially aware computation and communication," IEEE Computer, 38(3), 2005.
R.W. Picard, Affective Computing, MIT Press, 1997.
R.W. Picard, E. Vyzas, and J. Healey, "Toward machine emotional intelligence: Analysis of affective physiological state," IEEE Trans. on PAMI, 23(10):1175-1191, 2001.
M. Porta, "Vision-based user interfaces: methods and applications," Int. J. Human-Computer Studies, 57(1):27-73, 2002.
G. Potamianos, C. Neti, J. Luettin, and I. Matthews, "Audio-visual automatic speech recognition: An overview," Issues in Visual and Audio-Visual Speech Processing, E. Vatikiotis-Bateson and P. Per- rier, eds., MIT Press, 2004.
Proceedings of the IEEE, Special Issue on Multimodal Human Computer Interface. August, 2003.
L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, 1993.
L.M. Reeves, J-C. Martin, M. McTear, T. Raman, K. Stanney, H. Su, Q. Wang, J. Lai, J. Larson, S. Oviatt, T. Balaji, S. Buisine, P. Collings, P. Cohen, and B. Kraal, "Guidelines for multimodal user in- terface design," Communications of the ACM, 47(1):57-69, 2004.
J. Rehg, M. Loughlin, and K. Waters, "Vision for a Smart Kiosk," Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 690-696. 1997.
R. Rosales and S. Sclaroff, "Learning body pose via specialized maps," NIPS, vol. 14, pp 1263-1270, 2001.
M. Rosenblum, Y. Yacoob, and L. Davis, "Human expression recognition from motion using radial basis function network architecture," IEEE Trans. on Neurel Networks, 7(5):1121-1138, 1996.
A. Ross and A.K. Jain, "Information Fusion in Biometrics," Pattern Recognition Letters, Special Issue on Multimodal Biometrics, Vol. 24, No. 13, pp. 2115-2125, September 2003.
P. Roth and T. Pun, "Design and evaluation of a multimodal system for the non-visual exploration of digital pictures," INTERACT 2003.
R. Ruddaraju, A. Haro, K. Nagel, Q. Tran, I. Essa, G. Abowd, and E. Mynatt, "Perceptual user inter- faces using vision-based eye tracking," ICMI, 2003.
K. Salen and E. Zimmerman, Rules of Play: Game Design Fundamentals, MIT Press, 2003.
A. Santella and D. DeCarlo, "Robust clustering of eye movement recordings for quantification of visual interest," Eye Tracking Research and Applications (ETRA), pp. 27-34, 2004.
N. Sebe, I. Cohen, and T.S. Huang, "Multimodal emotion recognition," Handbook of Pattern Recog- nition and Computer Vision, World Scientific, 2005.
N. Sebe, I. Cohen, A. Garg, and T.S. Huang, Machine Learning in Computer Vision, Springer, 2005.
N. Sebe, I. Cohen, T. Gevers, and T.S. Huang, "Emotion Recognition Based on Joint Visual and Audio Cues," Int. Conf. on Pattern Recognition, pp. 1136-1139, 2006.
E. Schapira and R. Sharma, "Experimental evaluation of vision and speech based multimodal inter- faces," Workshop on Perceptive User Interfaces, pp. 1-9, 2001.
B. Schuller, M. Lang, and G. Rigoll, "Multimodal emotion recognition in audiovisual communica- tion," ICME, 2002.
T. Selker, "Visual attentive interfaces," BT Technology Journal, 22(4):146-150, 2004.
A. Shaikh, S. Juth, A. Medl, I. Marsic, C. Kulikoswki, and J. Flanagan, "An architecture for multimo- dal information fusion," Workshop on Perceptual User Interfaces, 1997.
R. Sharma, M. Yeasin, N. Krahnstoever, I. Rauschert, G. Cai, I. Brewer, A. MacEachren, and K. Sengupta, "Speech-gesture driven multimodal interfaces for crisis management," Proceedings of the IEEE, 91(9):1327-1354, 2003.
B. Shneiderman, "Direct manipulation for comprehensible, predictable, and controllable user inter- face," Int. Conf. on Intelligent User Interfaces, 1997.
B. Shneiderman, Leonardo's Laptop: Human Needs and the New Computing Technologies, MIT Press, 2002.
L.E. Sibert and R.J.K. Jacob, "Evaluation of eye gaze interaction," ACM Conf. Human Factors in Computing Systems (CHI), pp. 281-288, 2000.
L.C. de Silva and P. Ng, "Bimodal emotion recognition," Int. Conf. on Face and Gesture Recognition, 2000.
R. Simpson, E. LoPresti, S. Hayashi, I. Nourbakhsh, and D. Miller, "The smart wheelchair component system," J. of Rehabilitation Research and Development, May/June 2004.
P. Smith, M. Shah, and N.d.V. Lobo, "Determining driver visual attention with one camera," IEEE Trans. on Intelligent Transportation Systems, 4(4), 2003.
M. Song, J. Bu, C. Chen, and N. Li, "Audio-visual based emotion recognition: A new approach," CVPR, 2004.
F. Sparacino, "The museum wearable: Real-time sensor-driven understanding of visitors' interests for personalized visually-augmented museum experiences," Museums and the Web, 2002.
O. Stock, and M. Zancanaro, (eds.). Multimodal Intelligent Information Presentation. Series Text, Speech and Language Technology. Vol 27. Kluwer Academic. pp. 325-340, 2005.
M.M. Trivedi, S.Y. Cheng, E.M.C. Childers, and S.J. Krotosky, "Occupant posture analysis with stereo and thermal infrared video: Algorithms and experimental evaluation," IEEE Trans. on Vehicu- lar Technology, 53(6):1698-1712, 2004.
J. Trumbly, K. Arnett, and P. Johnson, "Productivity gains via an adaptive user interface," J. of Hu- man-computer Studies, 40:63-81, 1994.
M. Turk, "Gesture recognition," Handbook of Virtual Environment Technology, K. Stanney (ed.), 2001.
M. Turk, "Computer vision in the interface," Comm. of the ACM, 47(1):60-67, 2004.
M. Turk and G. Robertson, "Perceptual Interfaces," Comm. of the ACM, 43(3):32-34, 2000.
M. Turk and M. Kölsch, "Perceptual interfaces," G. Medioni and S.B. Kang, eds., Emerging Topics in Computer Vision, Prentice Hall, 2004.
M. Turk, "Multimodal human-computer interaction," Real-time Vision for Human-computer Interac- tion, B. Kisancanin, V. Pavlovic, and T.S. Huang, eds., Springer, 2005.
M. Turunen and J. Hakulinen, "Jaspis2 -An architecture for supporting distributed spoken dialogs," Eurospeech, 2003.
R. Vertegaal, ed., "Attentive user interfaces: Special issue," Comm. of the ACM, 46(3), 2003.
J.-G. Wang, E. Sung, and R. Venkateswarlu, "Eye gaze estimation from a single image of one eye," ICCV, pp. 136-143, 2003.
J.J.L. Wang and S. Singh, "Video analysis of human dynamics -A survey," Real-Time Imaging, 9(5):321-346, 2003.
L. Wang, W. Hu and T. Tan "Recent developments in human motion analysis," Pattern Recognition, 36 (2003) 585-601
K.C. Wassermann, K. Eng, P.F.M.J. Verschure, and J. Manzolli, "Live soundscape composition based on synthetic emotions," IEEE Multimedia Magazine, 10(4), 2003.
M. Weiser, "Some computer science issues in ubiquitous computing," Comm. of the ACM, 36(7):74- 83, 1993.
Y. Wu and T.S. Huang. "Vision-based gesture recognition: A review," 3 rd Gesture Workshop, 1999.
Y. Wu and T.S. Huang, "Human hand modeling, analysis and animation in the context of human computer interaction," IEEE Signal Processing, 18(3):51-60, 2001.
Y. Wu, G. Hua, and T. Yu, "Tracking articulated body by dynamic Markov network," ICCV, pp.1094-1101, 2003.
Y. Yacoob and L. Davis, "Recognizing human facial expressions from long image sequences using optical flow," IEEE Trans. on Pattern Analysis and Machine Intell., 18(6):636-642, 1996.
M.-H. Yang, D. Kriegman, and N. Ahuja, "Detecting faces in images: A survey," IEEE Trans. on PAMI, 24(1):34-58, 2002.
H. Yoshikawa, "Modeling humans in human-computer interaction," in Human-computer Interaction Handbook: Fundamentals, Evolving Technologies and Emerging Applications, pp. 118-146, 2002.
Y. Yoshitomi, S. Kim, T. Kawano, and T. Kitazoe, "Effect of sensor fusion for recognition of emo- tional states using voice, face image, and thermal image of face," Int. Workshop on Robot-human In- teraction, 2000.
C. Yu and D. H. Ballard, "A multimodal learning interface for grounding spoken language in sen- sorimotor experience," ACM Trans. on Applied Perception, 2004.
Q. Yuan, S. Sclaroff, and V. Athitsos, "Automatic 2D hand tracking in video sequences," IEEE Workshop on Applications of Computer Vision, 2005.
Z. Zeng, J. Tu, M. Liu, T. Zhang, N. Rizzolo, Z. Zhang, T.S. Huang, D. Roth, and S. Levinson, "Bi- modal HCI-related affect recognition," ICMI, 2004.
W. Zhao, R. Chellappa, A. Rosenfeld, and J. Phillips, "Face recognition: A literature survey," ACM Computing Surveys, 12:399-458, 2003.

Multimodal Human Computer Interaction: A Survey

Sign up for access to the world's latest research

Abstract

Related papers

References (196)

Related papers

Related topics

Cited by