Academia.eduAcademia.edu

Outline

AUDIO-DRIVEN HUMAN BODY MOTION ANALYSIS AND SYNTHESIS

Abstract

This paper presents a framework for audio-driven human body motion analysis and synthesis. We address the problem in the context of a dance performance, where gestures and movements of the dancer are mainly driven by a musical piece and characterized by the repetition of a set of dance !gures. The system is trained in a supervised manner using the multiview video recordings of the dancer. The human body posture is extracted from multiview video information without any human intervention using a novel marker-based algorithm based on annealing particle !ltering. Audio is analyzed to extract beat and tempo information. The joint analysis of audio and motion features provides a correlation model that is then used to animate a dancing avatar when driven with any musical piece of the same genre. Results are provided showing the effectiveness of the proposed algorithm.

References (14)

  1. REFERENCES
  2. T. Chen, "Audiovisual speech processing," IEEE Signal Pro- cessing Magazine, vol. 18, no. 1, pp. 9-21, 2001.
  3. M. Brand, "Voice puppetry," in Computer graphics and In- teractive Techniques (SIGGRAPH), Proc. Int. Conf. on, New York, NY, USA, 1999, pp. 21-28.
  4. Y. Li and H.Y. Shum, "Learning dynamic audio-visual map- ping with input-output hidden markov models," Multimedia, IEEE Trans. on, vol. 8, no. 3, pp. 542-549, 2006.
  5. F. O"i, E. Erzin, Y. Yemez, and A. M. Tekalp, "Estimation and analysis of facial animation parameter patterns," in Image Processing, IEEE Int. Conf. on, 2007.
  6. M. E. Sargin, E. Erzin, Y. Yemez, A. M. Tekalp, A. T. Erdem, C. Erdem, and M. Ozkan, "Prosody-driven head-gesture an- imation," in Acoustics, Speech and Signal Processing, IEEE Int. Conf. on, 2007, vol. 2, pp. 677-680.
  7. M. E. Sargin, O. Aran, A. Karpov, F. O"i, Y. Yasinnik, S. Wil- son, E. Erzin, Y. Yemez, and A. M. Tekalp, "Combined gesture -speech analysis and speech driven gesture synthesis," in Mul- timedia and Expo, IEEE Int. Conf. on, 2006, pp. 893-896.
  8. U. Bagci and E. Erzin, "Automatic classi!cation of musical genres using inter-genre similarity," IEEE Signal Processing Letters, vol. 14, no. 8, pp. 521-524, August 2007.
  9. Y. Ehara, H. Fujimoto, S. Miyazaki, S. Tanaka, and S. Ya- mamoto, "Comparison of the performance of 3d camera sys- tems," Gait and Posture, vol. 3, no. 3, pp. 166-169, Sep. 1995.
  10. C. Bregler and J. Malik, "Tracking people with twists and ex- ponential maps," in Computer Vision and Pattern Recognition, Proc. IEEE Int. Conf. on, 1998.
  11. J. Deutscher and I. Reid, "Articulated body motion capture by stochastic search," Int. Journal of Computer Vision, vol. 61, no. 2, pp. 185-205, Feb. 2005.
  12. C. Canton-Ferrer, J. R. Casas, and M. Pardàs, "Towards a Bayesian approach to robust !nding correspondences in multi- ple view geometry environments," in Lecture Notes on Com- puter Science, 2005, vol. 3515, pp. 281-289.
  13. M.S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp, "A tutorial on particle !lters for online nonlinear/non-Gaussian Bayesian tracking," Signal Processing, IEEE Tran. on, vol. 50, no. 2, pp. 174-188, 2002.
  14. M. Alonso, B. David, and G. Richard, "Tempo and beat es- timation of music signals," in Music Information Retrieval, Proc. Int. Conf. on, 2004.