AUDIO-DRIVEN HUMAN BODY MOTION ANALYSIS AND SYNTHESIS
Abstract
This paper presents a framework for audio-driven human body motion analysis and synthesis. We address the problem in the context of a dance performance, where gestures and movements of the dancer are mainly driven by a musical piece and characterized by the repetition of a set of dance !gures. The system is trained in a supervised manner using the multiview video recordings of the dancer. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing particle !ltering. Audio is analyzed to extract beat and tempo information. The joint analysis of audio and motion features provides a correlation model that is then used to animate a dancing avatar when driven with any musical piece of the same genre. Results are provided showing the effectiveness of the proposed algorithm.
References (14)
- REFERENCES
- T. Chen, "Audiovisual speech processing," IEEE Signal Pro- cessing Magazine, vol. 18, no. 1, pp. 9-21, 2001.
- M. Brand, "Voice puppetry," in Computer graphics and In- teractive Techniques (SIGGRAPH), Proc. Int. Conf. on, New York, NY, USA, 1999, pp. 21-28.
- Y. Li and H.Y. Shum, "Learning dynamic audio-visual map- ping with input-output hidden markov models," Multimedia, IEEE Trans. on, vol. 8, no. 3, pp. 542-549, 2006.
- F. O"i, E. Erzin, Y. Yemez, and A. M. Tekalp, "Estimation and analysis of facial animation parameter patterns," in Image Processing, IEEE Int. Conf. on, 2007.
- M. E. Sargin, E. Erzin, Y. Yemez, A. M. Tekalp, A. T. Erdem, C. Erdem, and M. Ozkan, "Prosody-driven head-gesture an- imation," in Acoustics, Speech and Signal Processing, IEEE Int. Conf. on, 2007, vol. 2, pp. 677-680.
- M. E. Sargin, O. Aran, A. Karpov, F. O"i, Y. Yasinnik, S. Wil- son, E. Erzin, Y. Yemez, and A. M. Tekalp, "Combined gesture -speech analysis and speech driven gesture synthesis," in Mul- timedia and Expo, IEEE Int. Conf. on, 2006, pp. 893-896.
- U. Bagci and E. Erzin, "Automatic classi!cation of musical genres using inter-genre similarity," IEEE Signal Processing Letters, vol. 14, no. 8, pp. 521-524, August 2007.
- Y. Ehara, H. Fujimoto, S. Miyazaki, S. Tanaka, and S. Ya- mamoto, "Comparison of the performance of 3d camera sys- tems," Gait and Posture, vol. 3, no. 3, pp. 166-169, Sep. 1995.
- C. Bregler and J. Malik, "Tracking people with twists and ex- ponential maps," in Computer Vision and Pattern Recognition, Proc. IEEE Int. Conf. on, 1998.
- J. Deutscher and I. Reid, "Articulated body motion capture by stochastic search," Int. Journal of Computer Vision, vol. 61, no. 2, pp. 185-205, Feb. 2005.
- C. Canton-Ferrer, J. R. Casas, and M. Pardàs, "Towards a Bayesian approach to robust !nding correspondences in multi- ple view geometry environments," in Lecture Notes on Com- puter Science, 2005, vol. 3515, pp. 281-289.
- M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, "A tutorial on particle !lters for online nonlinear/non-Gaussian Bayesian tracking," Signal Processing, IEEE Tran. on, vol. 50, no. 2, pp. 174-188, 2002.
- M. Alonso, B. David, and G. Richard, "Tempo and beat es- timation of music signals," in Music Information Retrieval, Proc. Int. Conf. on, 2004.