FILTWAM and Voice Emotion Recognition
2014, Lecture Notes in Computer Science
https://doi.org/10.1007/978-3-319-12157-4_10Abstract
This paper introduces the voice emotion recognition part of our framework for improving learning through webcams and microphones (FILTWAM). This framework enables multimodal emotion recognition of learners during game-based learning. The main goal of this study is to validate the use of microphone data for a real-time and adequate interpretation of vocal expressions into emotional states were the software is calibrated with end users. FILTWAM already incorporates a valid face emotion recognition module and is extended with a voice emotion recognition module. This extension aims to provide relevant and timely feedback based upon learner's vocal intonations. The feedback is expected to enhance learner's awareness of his or her own behavior. Six test persons received the same computer-based tasks in which they were requested to mimic specific vocal expressions. Each test person mimicked 82 emotions, which led to a dataset of 492 emotions. All sessions were recorded on video. An overall accuracy of our software based on the requested emotions and the recognized emotions is a pretty good 74.6% for the emotions happy and neutral emotions; but will be improved for the lower values of an extended set of emotions. In contrast with existing software our solution allows to continuously and unobtrusively monitor learners' intonations and convert these intonations into emotional states. This paves the way for enhancing the quality and efficacy of game-based learning by including the learner's emotional states, and links these to pedagogical scaffolding.
References (34)
- Anaraki, F.: Developing an Effective and Efficient eLearning Platform. International Journal of The Computer, the Internet and Management. Vol. 12(2):57-63. (2004)
- Nagarajan, P., Wiselin, G. J.: ONLINE EDUCATIONAL SYSTEM (e-learning). International Journal of u-and e-Service, Science and Technology. Vol. 3(4):37-48. (2010)
- Norman, G.: Effectiveness, efficiency, and e-learning. Journal of Advances in Health Sciences Education. Vol. 13 (3):249-251. (2008)
- Ebner, M.: E-Learning 2.0 = e-Learning 1.0 + Web 2.0?. The Second International Conference on Availability, Reliability and Security (ARES). p. 1235-1239. (2007)
- Hrastinski, S.: Asynchronous & synchronous e-learning. Educause Quarterly. Vol. 31(4):51- 55. (2008)
- Kelle, S., Sigurðarson, S., Westera, W., Specht, M.: Game-Based Life-Long Learning. In G. D. Magoulas (Ed.), E-Infrastructures and Technologies for Lifelong Learning: Next Generation Environments. Hershey, PA: IGI Global. p. 337-349. (2011)
- Connolly, T. M., Boyle, E. A., MacArthur, E., Hainey, T., Boyle, J. M.: A systematic literature review of empirical evidence on computer games and serious games. Computers and Education. September. Vol. 59(2):661-686. (2012)
- Reeves, B., Read, J.L.: Total engagement: Using games and virtual worlds to change the way people work and business compete. Boston. Harvard Business Press. (2009)
- Gee, J.P.: What video games have to teach us about learning and literacy. New York: Palgrave Macmillan. (2003)
- Nadolski, R. J., Hummel, H. G. K., Van den Brink, H. J., Hoefakker, R., Slootmaker, A., Kurvers, H., Storm, J.: EMERGO: methodology and toolkit for efficient development of serious games in higher education. Simulations & Gaming. Vol. 39(3):338-352. (2008)
- Bahreini, K., Nadolski, R., Qi, W., Westera, W.: FILTWAM -A Framework for Online Game-based Communication Skills Training -Using Webcams and Microphones for Enhancing Learner Support. In P. Felicia (Ed.), The 6th European Conference on Games Based Learning (ECGBL). Cork, Ireland. p. 39-48. (2012)
- Avidan, S., Butman, M.: Blind vision. European Conference on Computer Vision. Vol. 3953:1-13. (2006)
- Bashyal, S., Venayagamoorthy, G.K.: Recognition of facial expressions using Gabor wavelets and learning vector quantization. Engineering Applications of Artificial Intelligence. (2008)
- Chibelushi, C. C., Bourel, F.: Facial expression recognition: a brief tutorial overview. Available Online in Compendium of Computer Vision. (2003)
- Ekman, P., Friesen, W. V.: Facial Action Coding System: Investigator's Guide. Consulting Psychologists Press. (1978)
- Kanade, T.: Picture processing system by computer complex and recognition of human faces. PhD thesis. Kyoto University, Japan. (1973)
- Li, S. Z., Jain, A. K.: Handbook of Face Recognition Second Edition. ISBN 978-0-85729- 931-4. Springer-Verlag, London. (2011)
- Petta, P., Pelachaud, C., Cowie, R.: Emotion-Oriented Systems. The Humaine Handbook. Springer-Verlag. Berlin. (2011)
- Chen, L.S.: Joint Processing of Audio-visual Information for the Recognition of Emotional Expressions in Human-computer Interaction. University of Illinois at Urbana-Champaign. PhD thesis. (2000)
- Fong, T., Nourbakhsh, I., Dautenhahn, K.: A survey of socially interactive robots. Robotics and Autonomous Systems. Vol. 42(3-4):143-166. (2003)
- Sebe, N., Cohen, I. I., Gevers, T., Huang, T. S.: Emotion recognition based on joint visual and audio cues. International Conference on Pattern Recognition. Hong Kong. p. 1136-1139. (2006)
- Song, M., Bu, J., Chen, C., Li, N.: Audio-visual based emotion recognition: A new approach. IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Vol. 2. (2004)
- Subramanian, R., Staiano, J., Kalimeri, K., Sebe, N., Pianesi, F.: Putting the Pieces Together: Multimodal Analysis of Social Attention in Meetings. ACM Multimedia. Firenze. Italy. (2010)
- Zeng, Z., Pantic, M., Roisman, G. I., Huang, T. S.: A survey of affect recognition methods: Audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 31(1):39-58. (2009)
- Sebe, N.: Multimodal Interfaces: Challenges and Perspectives. Journal of Ambient Intelligence and Smart Environments. January. Vol. 1(1):23-30. (2009)
- Pekrun, R.: The impact of emotions on learning and achievement: towards a theory of cognitive/motivational mediators. Journal of Applied Psychology. Vol. 41:359-376. (1992)
- Hager, P. J., Hager, P., Halliday, J.: Recovering Informal Learning: Wisdom, Judgment And Community. Springer. (2006)
- Vogt, T. André, E. Bee, N.: EmoVoice -A framework for online recognition of emotions from voice. In Proceedings of Workshop on Perception and Interactive Technologies for Speech-Based Systems. (2008)
- Wagner, J. Lingenfelser, F. Andre, E.: The Social Signal Interpretation Framework (SSI) for Real Time Signal Processing and Recognitions. In Proceedings of INTERSPEECH. Florence, Italy. (2011)
- Schuller, B., Manfred, L., Gerhard, R.: Automatic emotion recognition by the speech signal. Institute for Human-Machine-Communication. Technical University of Munich. 80290. (2002)
- Bahreini, K., Nadolski, R., Westera, W.: FILTWAM -A Framework for Online Affective Computing in Serious Games. The 4th International Conference on Games and Virtual Worlds for Serious Applications (VS-GAMES'12). Procedia Computer Science. Genoa, Italy. Vol. 15:45-52. (2012)
- Lang, G., van der Molen, H. T.: Psychologische gespreksvoering. Open University of the Netherlands. Heerlen, The Netherlands. (2008)
- Van der Molen, H. T., Gramsbergen-Hoogland, Y. H.: Communication in Organizations: Basic Skills and Conversation Models. ISBN 978-1-84169-556-3. Psychology Press, New York. (2005)
- Dai, K., Harriet J. F., MacAuslan, J.: Recognizing emotion in speech using neural networks. Telehealth and Assistive Technologies. p.31-38. (2008)