Multimodal human emotion/expression recognition

L.S. Chen; T.S. Huang; T. Miyasato; R. Nakatsu

doi:10.1109/AFGR.1998.670976

Outline

Multimodal human emotion/expression recognition

Ryohei Nakatsu

1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition

https://doi.org/10.1109/AFGR.1998.670976

visibility

…

description

6 pages

link

1 file

Abstract

Recognizing human facial expression and emotion by computer is an interesting and challenging problem. Many have investigated emotional contents in speech alone, or recognition of human facial expressions solely from images. However, relatively little has been done in combining these two modalities for recognizing human emotions. De Silva et al. 4 studied human subjects' ability to recognize emotions from viewing video clips of facial expressions and listening to the corresponding emotional speech stimuli. They found that humans recognize some emotions better by audio information, and other emotions better by video. They also proposed an algorithm to integrate both kinds of inputs to mimic human's recognition process. While attempting to implement the algorithm, we encountered di culties which led us to a di erent approach. We found these two modalities to be c omplimentary. By using both, we show it is possible to achieve higher recognition rates than either modality alone.

Figures (6)

gives some relative probability for each emotion cate- gory (A(Hap) to A(Fea), V(Hap) to V(Fea)). At the weighting matrix, these 12 numbers are combined to produce six final decision numbers, one for each emo- tion category. The emotion class that has the largest value is the recognized emotion. Figure 1. Proposed bimodal emotion recogni- tion system by De Silva et al.

Figure 2. Audio Processing: (a)Original speech utterance, (b) end-point detected ut- terance(note different time scale), (c) com- puted pitch contour.

From the Spanish maximum pitch and average pitch (Figures 3 and 4), the samples from the same class have similar pitch. It is possible to separate Sadness and Dislike from the other four classes by simple threshold- ing. While this is still far from being able to distinguish all the six classes, it shows that perhaps a coarse-to- fine approach may be feasible, i.e., first separate them into groups according to this feature, then find other features to separate them further.

in the maximum pitch plot(Figure 3). Also the sub- jects confused between Anger and Surprise, and there is also overlap in the average pitch plot for Spanish for these two classes. For Sinhala audio, the subjects con- fused between Surprise and Anger, and from Figure 5, we see that there is large overlap between these two classes. Figure 5. Average Pitch for Sinhala.

Figure 6. Spanish speaker: (a) Sadness, (b) Dislike, (c) Anger, (d) Surprise.

References (17)

C.C. Chiu, Y.L. Chang, and Y.J. Lai, The Analy- sis and Recognition of Human Vocal Emotions," in Proc. International Computer Symposium, NCTU, Hsihchu, Taiwan, R.O.C., December 12-15, 1994.
R. Cowie and E. Douglas-Cowie, Automatic Sta- tistical Analysis Of The Signal And Prosodic Signs Of Emotion In Speech," in Proc. International Conf. on Spoken Language Processing, Philadel- phia, PA, USA, pp. 1989 1992, October 3-6, 1996.
F. Dellaert, T. Polzin and A. Waibel, Recognizing Emotion in Speech," in Proc. International Conf. on Spoken Language Processing, Philadelphia, PA, USA, pp. 1970 1973, October 3-6, 1996.
L. C. De Silva, T. Miyasato, and R. Nakatsu, Fa- cial Emotion Recognition Using Multimodal In- formation." in Proc. IEEE Int. Conf. on Infor- mation, Communications and Signal Processing ICICS'97, Singapore, pp. 397-401, Sept. 1997.
P. Ekman, ed, Emotion In the Human Face, C a m - bridge: Cambridge University Press, 1982.
P. Ekman, Strong Evidence for Universals in Fa- cial Expressions: A Reply to Russell's Mistaken Critique," Psychological Bulletin, vol. 115, no. 2, pp. 268 287, 1994.
I. A. Essa and A. P. P entland, Coding, Analysis, Interpretation, and Recognition of Facial Expres- sions," IEEE Trans. PAMI, vol. 19, no. 7, pp. 757 763, July 1997.
H. Fujisaki, Prosody, Models, and Spontaneous Speech," in Computing Prosody, by Y. Sagisaka, N. Campbell, and N. Higuchi, Eds, Springer-Verlag, New York, 1997.
T. Johnstone, Emotional Speech Elicited Using Computer Games," in Proc. International Conf. on Spoken Language Processing, Philadelphia, PA, USA, pp. 1985 1988, October 3-6, 1996.
K. Mase, Recognition of Facial Expression from Optical Flow," IEICE Trans., vol. E74, no. 10, pp. 3474 3483, October 1991.
I. R. Murray and J. L. Arnott, Toward the simu- lation of Emotion in Synthetic Speech: A Review of The Literature of Human Vocal Emotion," Journal of the Acoustic Society of America, vol. 93, no. 2, pp. 1097 1108, February 1993.
T. Otsuka and J. Ohya, Recognizing Multiple Persons' Facial Expressions Using HMM Based on Automatic Extraction of Signi cant Frames from Image Sequences," in Proc. Int. Conf. on Image Processing ICIP-97, Santa Barbara, CA, USA, pp. 546 549, Oct 26-29, 1997.
M. Rosenblum, Y. Yacoob and L.S. Davis, Hu- man Expression Recognition from Motion Using a Radial Basis Function Network Architecture," IEEE Trans. Neural Network, vol. 7, no. 5, pp. 1121 1138, September 1996.
J. Sato and S. Morishima, Emotion Modeling in Speech Production Using Emotion Space," in Proc. IEEE Int. Workshop on Robot and Human Commu- nication, Tsukuba, Japan, pp. 472 477, Nov 1996.
K. R. Scherer, Adding The A ective Dimension: A New Look In Speech Analysis And Synthesis," in Proc. International Conf. on Spoken Language Processing, Philadelphia, PA, USA, page no. not available, October 3-6, 1996.
T. Sakaguchi, Facial Feature Extraction Based on the Wavelet Transform for Dynamic Expression Recognition," Submitted to IEEE Trans. PAMI.
N. Ueki, S. Morishima, H. Yamada, and H. Ha- rashima, Expression Analysis Synthesis System Based on Emotion Space Constructed by Multilay- ered Neural Network," Systems and Computers in Japan, vol. 25, no. 13, pp. 95-103, Nov. 1994.

Multimodal human emotion/expression recognition

Sign up for access to the world's latest research

Abstract

Related papers

References (17)

Related papers

Related topics

Cited by