Recent developments in automated lip-reading
https://doi.org/10.1117/12.2029464Abstract
Human lip-readers are increasingly being presented as useful in the gathering of forensic evidence but, like all humans, suffer from unreliability. Here we report the results of a long-term study in automatic lip-reading with the objective of converting video-to-text (V2T). The V2T problem is surprising in that some aspects that look tricky, such as real-time tracking of the lips on poor-quality interlaced video from hand-held cameras, but prove to be relatively tractable. Whereas the problem of speaker independent lip-reading is very demanding due to unpredictable variations between people. Here we review the problem of automatic lip-reading for crime fighting and identify the critical parts of the problem. ! Figure 1. Left Relative angle of each camera to the speakers face. From left to right are cropped-out frames recorded by cameras placed on each of the 0, 30, and 45 • angles related to the speakers face. On the right are zooms of the mouth region showing motion blur and interlace "zippering" * The sentence grammar is also illustrated in Figure 5.
References (6)
- Bowden, R., Cox, S. J., Harvey, R. W., Lan, Y., Ong, E.-J., Owen, G., and Theobald, B.-J., "Is automated conversion of video to text a reality?," in [Optics and Photonics for Counterterrorism, Crime Fighting Defence VIII], Lewis, C. and Burgess, D., eds., 8546(85460U), SPIE, Bellingham, WA (2012).
- Lan, Y., Harvey, R., and Theobald, B.-J., "Insights into machine lip reading," in [Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on], 4825 -4828 (March 2012).
- Lowe, D. G., "Distinctive image features from scale-invariant keypoints," International Journal of Computer Vision 60(2), 91-110 (2004).
- Murphy-Chutorian, E., Doshi, A., and Trivedi, M., "Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation," in [Intelligent Transportation Systems Conference, 2007. ITSC 2007. IEEE ], 709-714 (2007).
- Young, S., Evenmann, G., Kershaw, D., Moore, G., Odell, J., Ollason, D., Valtchev, V., and Woodland, P., The HTK Book (version 3.2.1) (2002).
- Ong, E., Lan, Y., Theobald, B., Harvey, R., and Bowden, R., "Robust facial feature tracking using se- lected multi-resolution linear predictors," in [In Proceedings of the International Conference Computer Vision (ICCV)], (2009).
Richard Bowden
Richard Harvey