Academia.eduAcademia.edu

Outline

Recent developments in automated lip-reading

https://doi.org/10.1117/12.2029464

Abstract

Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all humans, suffer from unreliability. Here we report the results of a long-term study in automatic lip-reading with the objective of converting video-to-text (V2T). The V2T problem is surprising in that some aspects that look tricky, such as real-time tracking of the lips on poor-quality interlaced video from hand-held cameras, but prove to be relatively tractable. Whereas the problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Here we review the problem of automatic lip-reading for crime fighting and identify the critical parts of the problem. ! Figure 1. Left Relative angle of each camera to the speakers face. From left to right are cropped-out frames recorded by cameras placed on each of the 0, 30, and 45 • angles related to the speakers face. On the right are zooms of the mouth region showing motion blur and interlace "zippering" * The sentence grammar is also illustrated in Figure 5.

References (6)

  1. Bowden, R., Cox, S. J., Harvey, R. W., Lan, Y., Ong, E.-J., Owen, G., and Theobald, B.-J., "Is automated conversion of video to text a reality?," in [Optics and Photonics for Counterterrorism, Crime Fighting Defence VIII], Lewis, C. and Burgess, D., eds., 8546(85460U), SPIE, Bellingham, WA (2012).
  2. Lan, Y., Harvey, R., and Theobald, B.-J., "Insights into machine lip reading," in [Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on], 4825 -4828 (March 2012).
  3. Lowe, D. G., "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision 60(2), 91-110 (2004).
  4. Murphy-Chutorian, E., Doshi, A., and Trivedi, M., "Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation," in [Intelligent Transportation Systems Conference, 2007. ITSC 2007. IEEE ], 709-714 (2007).
  5. Young, S., Evenmann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK Book (version 3.2.1) (2002).
  6. Ong, E., Lan, Y., Theobald, B., Harvey, R., and Bowden, R., "Robust facial feature tracking using se- lected multi-resolution linear predictors," in [In Proceedings of the International Conference Computer Vision (ICCV)], (2009).