Academia.eduAcademia.edu

Outline

Realistic speech animation based on observed 3D face dynamics

2005

Abstract

We propose an efficient system for realistic speech animation. The system supports all steps of the animation pipeline, from the capture or design of 3D head models up to the synthesis and editing of the performance. This pipeline is fully 3D, which yields high flexibility in the use of the animated character. Real detailed 3D face dynamics, observed at video frame rate for thousands of points on the face of speaking actors, underpin the realism of the facial deformations. These are given a compact and intuitive representation via Independent Component Analysis (ICA). Performances amount to trajectories through this 'Viseme Space'. When asked to animate a face the system replicates the 'Visemes' that it has learned, and adds the necessary coarticulation effects. Realism has been improved through comparisons with motion captured groundtruth. Faces for which no 3D dynamics could be observed can be animated nonetheless. Their visemes are adapted automatically to their physiognomy by localising the face in a 'Face Space'.

References (36)

  1. F. I. Parke, "Computer generated animation of faces," in ACM National Conference, 1972, pp. 451-457.
  2. O. Eben, personal communication, Pixar Animation Studios, 1997.
  3. C. Bregler, M. Covell, and M. Slaney, "Video rewrite: driving visual speech with audio," in SIGGRAPH, 1997, pp. 353-360.
  4. T. Beier and S. Neely, "Feature-based image metamorphosis." in SIG- GRAPH'99 Conference Proceedings, vol. 26, 1992, pp. 35-42.
  5. C. Bregler and S. Omohundro, "Nonlinear image interpolation using manifold learning," in NIPS, vol. 7, 1995.
  6. T. Ezzat and T. Poggio, "Visual speech synthesis by morphing visemes," in International Journal of Computer Vision, vol. 38, 2000, pp. 45-57.
  7. T. Ezzat, G. Geiger, and T. Poggio, "Trainable videorealistic speech animation." in Proc. SIGGRAPH, 2002, pp. 388-398.
  8. D. Beymer and T. Poggio, "Image representation for visual learning," in Science, vol. 272, 1996, pp. 1905-1909.
  9. M. Bichsel, "Automatic interpolation and recognition of faces by morph- ing," in Proc. 2nd Int. Conf. on Automatic Face and Gesture Recognition, 1996, pp. 128-135.
  10. E. Cosatto and H. Graf, "Photo-realistic talking-heads from image samples," in IEEE Trans. on Multimedia, vol. 2, 2000, pp. 152-163.
  11. E. Cosatto, "Sample-based talking-head synthesis," in PhD Thesis, Signal Processing Lab, Swiss Federal Institute of Techology, Lausanne, Switzerland, 2002.
  12. D. Chen, A. State, and D. Banks, "Interactive shape metamorphosis," in SIGGRAPH'95 Conference Proceedings, S. on Interactive 3D Graphics, Ed., 1995, pp. 43-44.
  13. V. Blanz and T. Vetter, "A morphable model for the synthesis of 3d faces," in Proc. SIGGRAPH, 1999, pp. 187-194.
  14. F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, and D. H. Salesin, "Syn- thesizing realistic facial expressions from photographs." in SIGGRAPH 98 Proceedings, 1998, pp. 75-84.
  15. B. Guenter, C. Grimm, D. Wood, H. Malvar, and F. Pighin, "Making faces," in Proc. SIGGRAPH, 1998, pp. 55-66.
  16. I. Lin, J. Yeh, and M. Ouhyoung, "Realistic 3d facial animation parameters from mirror-reflected multi-view video," in Proc. Computer Animation 2001 Conf., 2001, pp. 2-11.
  17. K. Waters and J. Frisbie, "A coordinated muscle model for speech animation," in Graphics Interface, 1995.
  18. C. Pelachaud, N. Badler, and M. Steedman, "Generating facial expres- sions for speech," Cognitive Science, vol. 20, no. 1, pp. 1-46, 1996.
  19. K. Kähler, J. Haber, and H.-P. Seidel, "Reanimationg the dead: Re- construction of expressive faces from skull data," in SIGGRAPH 2003 Conference Proceedings, 2003, pp. 554-561.
  20. Eyetronics, http://www.eyetronics.com, 2004.
  21. Alias, http://www.alias.com, 2004.
  22. J. Noh and U. Neumann, "Expression cloning." in Proc. SIGGRAPH, 2001, pp. 277-288.
  23. G. Kalberer, P. Mueller, and L. Van Gool, "Generating visemes for realistic animation." in Vision, Modeling, and Visualization VMV 2002, 2002, pp. 233-240.
  24. M. Turk and A. Pentland, "Face recognition using eigenfaces." in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1991, pp. 586- 591.
  25. Strang, "Linear algebra and its application," in proc. HBJ, 1988.
  26. B. Lucas and T. Kanade, "An iterative image registration technique with an application to stereo vision," in Proc. Int. Joint Conf. on Artificial Intelligence, 1981, pp. 764-679.
  27. C. Tomasi and T. Kanade, "Detection and tracking of feature points," in Carnegie Mellon University Technical Report CMU-CS-91-132, Pitts- burgh, PA, 1991.
  28. A. Hyvärinen, "Independent component analysis by minimizing of mutual information." in Technical Report A46, Helsinki University of Technology, Helsinki University of Technology, Department of Com- puter Science and Engineering, Laboratrt of Computer and Information Science, Rakentajanaukio 2 C, FIN-02150 Espoo, Finland, 1997.
  29. K. Scott, D. Kagels, S. Watson, H. Rom, J. Wright, M. Lee, and K. Hussey, "Synthesis of speaker facial movement to match selected speech sequences," in In Proceedings of the Fifth Australian Conference on Speech Science and Technology, vol. 2, 1994, pp. 620-625.
  30. O. Owens and B. Blazek, "Visemes observed by hearing-impaired and normal-hearing adult viewers." in Jour. Speech and Hearing Research, vol. 28, 1985, pp. 381-393.
  31. A. Montgomery and P. Jackson, "Physical characteristics of the lips underlying vowel lipreading performance." in Jour. Acoust. Soc. Am., vol. 73, no. 6, 1983, pp. 2134-2144.
  32. D. Massaro, Perceiving Talking Faces. MIT. Press, 1998.
  33. G. Kalberer and L. Van Gool, "Realistic face animation for speech." in Int. Journal of Visualization and Computer Animation, vol. 13, 2002, pp. 97-106.
  34. C. Traber, SVOX: The Implementation of a Text-to-Speech System. PhD thesis. Computer Engineering and Networks Laboratory, ETH; No. 11064, 1995.
  35. G. Kalberer and L. Van Gool, "Lip animation based on observed 3d speech dynamics." in Proc. SPIE, Videometrics and Optical Method for 3D Shape Measurement, vol. 4309, 2001, pp. 16-25.
  36. S. Kshirsagar, T. Molet, and N. Magnenat-Thalmann, "Principal com- ponents of expressive speech animation." in Proc. Computer Graphics International 2001, 2001, pp. 38-44.