Academia.eduAcademia.edu

Outline

Gender classification via lips: Static and dynamic features

2013, IET Biometrics

https://doi.org/10.1049/IET-BMT.2012.0021

Abstract

Automatic gender classification has many security and commercial applications. Various modalities have been investigated for gender classification with face-based classification being the most popular. In some real-world scenarios the face may be partially occluded. In these circumstances a classification based on individual parts of the face known as local features must be adopted. We investigate gender classification using lip movements. We show for the first time that important gender specific information can be obtained from the way in which a person moves their lips during speech. Furthermore our study indicates that the lip dynamics during speech provide greater gender discriminative information than simply lip appearance. We also show that the lip dynamics and appearance contain complementary gender information such that a model which captures both traits gives the highest overall classification result. We use Discrete Cosine Transform based features and Gaussian Mixture Modelling to model lip appearance and dynamics and employ the XM2VTS database for our experiments. Our experiments show that a model which captures lip dynamics along with appearance can improve gender classification rates by between 16-21% compared to models of only lip appearance.

References (28)

  1. E. Makinen and R. Raisamo, "Evaluation of gender classification methods with automatically detected and aligned faces," PAMI, vol. 30, pp. 541-547, March 2008.
  2. D. Stewart, H. Wang, J. Shen, and P. Miller, "Investigations into the robustness of audio-visual gender classification to background noise and illumination effects," in Proceedings of the 2009 Digital Image Computing: Techniques and Applications, DICTA '09, (Washington, DC, USA), pp. 168-174, IEEE Computer Society, 2009.
  3. B. Moghaddam and M.-H. Yang, "Learning gender with support faces," IEEE Trans.Pattern Anal.Mach.Intell., vol. 24, no. 5, pp. 707-711, 2002.
  4. S. Buchala, N. Davey, T. M. Gale, and R. J. Frank, "Principal component analysis of gender, ethnicity, age, and identity of face images," in In IEEE ICMI, 2005.
  5. C. Shan, S. Gong, and P. W. McOwan, "Learning gender from human gaits and faces," in AVSS '07: Proceedings of the 2007 IEEE Conference on Advanced Video and Signal Based Surveillance, (Washington, DC, USA), pp. 505-510, IEEE Computer Society, 2007.
  6. M. Collins, J. G. Zhang, P. Miller, and H. B. Wang, "Full body image feature representations for gender profiling," in VS09, pp. 1235-1242, 2009.
  7. G. Amayeh, G. Bebis, and M. Nicolescu, "Gender classification from hand shape," in Computer Vision and Pattern Recognition Workshops, 2008. CVPRW '08. IEEE Computer Society Conference on, p. 1, 23-28 2008.
  8. J. Ma, W. Liu, and P. Miller, "An evidential improvement for gender profiling.," in Belief Functions (T. Denoeux and M.-H. Masson, eds.), vol. 164 of Advances in Soft Computing, pp. 29-36, Springer, 2012.
  9. Y. Andreu and R. A. Mollineda, "The role of face parts in gender recognition," in ICIAR '08: Proceedings of the 5th international conference on Image Analysis and Recognition, (Berlin, Heidelberg), pp. 945-954, Springer-Verlag, 2008.
  10. P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, "The feret evaluation methodology for face-recognition algorithms," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, pp. 1090-1104, 2000.
  11. F. Matta, U. Saeed, C. Mallauran, and J.-L. Dugelay, "Facial gender recognition using multiple sources of visual informa- tion," in International Workshop on Multimedia Signal Processing, MMSP, pp. 785-790, 2008. DBLP:conf/mmsp/2008.
  12. A. Hadid and M. Pietikainen, "Combining motion and appearance for gender classification from video sequences," in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, p. 1, 8-11 2008.
  13. K. Messer, J. Matas, J. Kittler, J. Lttin, and G. Maitre, "Xm2vtsdb: The extended m2vts database," in In Second International Conference on Audio and Video-based Biometric Person Authentication, pp. 72-77, 1999.
  14. E. D. Petajan, Automatic lipreading to enhance speech recognition (speech reading). PhD thesis, Champaign, IL, USA, 1984.
  15. T. Wark, S. Sridharan, and V. Chandran, "The use of speech and lip modalities for robust speaker verification under adverse conditions," in ICMCS '99: Proceedings of the IEEE International Conference on Multimedia Computing and Systems, (Washington, DC, USA), pp. 812-816, IEEE Computer Society, 1999.
  16. A. G. de la Cuesta, J. Zhang, and P. Miller, "Biometric identification using motion history images of a speaker's lip movements," in IMVIP '08: Proceedings of the 2008 International Machine Vision and Image Processing Conference, (Washington, DC, USA), pp. 83-88, IEEE Computer Society, 2008.
  17. R. Kaucic, B. Dalton, and A. Blake, "Real-time lip tracking for audio-visual speech recognition applications," in Proc. European Conf. on Computer Vision, (Cambridge, UK), pp. 376-387, 1996.
  18. M. Gordan, C. Kotropoulos, and I. Pitas, "Pseudoautomatic lip contour detection based on edge direction patterns," in Proc. of 2nd IEEE R8 -EURASIP Symposium on Image and Signal Processing and Analysis, (Pula, Croatia), pp. 138-143, June 2001.
  19. R. Goecke, J. B. Millar, A. Zelinsky, and J. Robert-Ribes, "A detailed description of the AVOZES data corpus," in In Proc. of the IEEE Intl Conf. on Acoustics, Speech, and Signal Processing, (Salt Lake City, USA), pp. 486-491, May 2001.
  20. G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. of Int'l Conf. on Image Processing, vol. 3, (Chicago), pp. 173-177, 1998.
  21. P. Císař, M. Železný, J. Zelinka, and J. Trojanová, "Development and testing of new combined visual speech param- eterization," in Proc. of the Intl Conf. on Auditory-Visual Speech Processing (AVSP 2007), (Hilvarenbeek, Netherland), 2007.
  22. M. Heckmann, K. Kroschel, C. Savariaux, and F. Berthommier, "DCT-based video features for audio-visual speech recognition," in Proc. of International Conference on Spoken Language Processing, (Denver, Colorado), pp. 1925-1928, September 2002.
  23. I. Matthews, G. Potamianos, C. Neti, and J. Luettin, "A comparison of model and transform-based visual features for audio-visual LVCSR," in International Conference on Multimedia and Expo, (Tokyo, Japan), p. 210, 2001.
  24. G. Potamianos and H. Graf, "Linear discriminant analysis for speechreading," in Proc. Works. Multimedia Signal Process., (Los Angeles), pp. 221-226, 1998.
  25. R. Seymour, D. Stewart, and J. Ming, "Comparison of image transform based features for visual speech recognition in clean and corrupted videos," EURASIP Journal on Image and Video Processing, vol. 2008, pp. 1-9, 2008.
  26. D. Dean, S. Sridharan, and T. Wark, "Audio-visual speaker verification using continuous fused hmms," in Proceedings of the HCSNet workshop on Use of vision in human-computer interaction -Volume 56, VisHCI '06, (Darlinghurst, Australia, Australia), pp. 87-92, Australian Computer Society, Inc., 2006.
  27. R. Leonard, "A database for speaker-independent digit recognition," in Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '84., vol. 9, pp. 328-331, 1984.
  28. J. Odell, D. Ollason, P. Woodland, S. Young, and J. Jansen, The HTK Book for HTK V2.0. Cambridge University Press, Cambridge, UK, 1995.