Head Detection with Depth Images in the Wild
2018, Proceedings of the 13th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
https://doi.org/10.5220/0006541000560063Abstract
Head detection and localization is a demanding task and a key element for many computer vision applications, like video surveillance, Human Computer Interaction and face analysis. The stunning amount of work done for detecting faces on RGB images, together with the availability of huge face datasets, allowed to setup very effective systems on that domain. However, due to illumination issues, infrared or depth cameras may be required in real applications. In this paper, we introduce a novel method for head detection on depth images that exploits the classification ability of deep learning approaches. In addition to reduce the dependency on the external illumination, depth images implicitly embed useful information to deal with the scale of the target objects. Two public datasets have been exploited: the first one, called Pandora, is used to train a deep binary classifier with face and non-face images. The second one, collected by Cornell University, is used to perform a cross-dataset test during daily activities in unconstrained environments. Experimental results show that the proposed method overcomes the performance of state-of-art methods working on depth images.
References (32)
- Bagdanov, A. D., Del Bimbo, A., and Masi, I. (2011). The florence 2d/3d hybrid face dataset. In Proceedings of the 2011 joint ACM workshop on Human gesture and behavior understanding, pages 79-80. ACM.
- Baltrušaitis, T., Robinson, P., and Morency, L.-P. (2012). 3d constrained local model for rigid and non-rigid facial tracking. In Computer Vision and Pattern Recogni- tion (CVPR), 2012 IEEE Conference on, pages 2610- 2617. IEEE.
- Borghi, G., Gasparini, R., Vezzani, R., and Cucchiara, R. (2017a). Embedded recurrent network for head pose estimation in car. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV).
- Borghi, G., Venturelli, M., Vezzani, R., and Cucchiara, R. (2017b). Poseidon: Face-from-depth for driver pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Bourdev, L. and Malik, J. (2009). Poselets: Body part de- tectors trained using 3d human pose annotations. In Computer Vision, 2009 IEEE 12th International Con- ference on, pages 1365-1372. IEEE.
- Chen, S., Bremond, F., Nguyen, H., and Thomas, H. (2016). Exploring depth information for head detection with depth images. In Advanced Video and Signal Based Surveillance (AVSS), 2016 13th IEEE International Conference on, pages 228-234. IEEE.
- Chollet, F. et al. (2015). Keras.
- Cortes, C. and Vapnik, V. (1995). Support vector machine. Machine learning, 20(3):273-297.
- Dalal, N. and Triggs, B. (2005). Histograms of oriented gra- dients for human detection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Com- puter Society Conference on, volume 1, pages 886- 893. IEEE.
- Fanelli, G., Weise, T., Gall, J., and Van Gool, L. (2011). Real time head pose estimation from consumer depth cameras. In Joint Pattern Recognition Symposium, pages 101-110. Springer.
- Freund, Y. and Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computa- tional learning theory, pages 23-37. Springer.
- Frigieri, E., Borghi, G., Vezzani, R., and Cucchiara, R. (2017). Fast and accurate facial landmark localization in depth images for in-car applications. In Proceed- ings of the 19th International Conference on Image Analysis and Processing (ICIAP).
- Ikemura, S. and Fujiyoshi, H. (2011). Real-time human detection using relational depth similarity features. Computer Vision-ACCV 2010, pages 25-38.
- Khan, M. H., Shirahama, K., Farid, M. S., and Grzegorzek, M. (2016). Multiple human detection in depth images. In Multimedia Signal Processing (MMSP), 2016 IEEE 18th International Workshop on, pages 1-6. IEEE.
- Kingma, D. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. (2012). Im- agenet classification with deep convolutional neural networks. In Advances in neural information process- ing systems, pages 1097-1105.
- Levi, K. and Weiss, Y. (2004). Learning object detection from a small number of examples: the importance of good features. In Computer Vision and Pattern Recog- nition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II-II. IEEE.
- Lowe, D. G. (1999). Object recognition from local scale- invariant features. In Computer vision, 1999. The pro- ceedings of the seventh IEEE international conference on, volume 2, pages 1150-1157. Ieee.
- Nghiem, A. T., Auvinet, E., and Meunier, J. (2012). Head detection using kinect camera and its application to fall detection. In Information Science, Signal Process- ing and their Applications (ISSPA), 2012 11th Inter- national Conference on, pages 164-169. IEEE.
- Osuna, E., Freund, R., and Girosit, F. (1997). Training support vector machines: an application to face de- tection. In Computer vision and pattern recognition, 1997. Proceedings., 1997 IEEE computer society con- ference on, pages 130-136. IEEE.
- Rowley, H. A., Baluja, S., and Kanade, T. (1998). Neural network-based face detection. IEEE Transactions on pattern analysis and machine intelligence, 20(1):23- 38.
- Sarbolandi, H., Lefloch, D., and Kolb, A. (2015). Kinect range sensing: Structured-light versus time-of-flight kinect. Computer vision and image understanding, 139:1-20.
- Shotton, J., Sharp, T., Kipman, A., Fitzgibbon, A., Finoc- chio, M., Blake, A., Cook, M., and Moore, R. (2013). Real-time human pose recognition in parts from sin- gle depth images. Communications of the ACM, 56(1):116-124.
- Theano Development Team (2016). Theano: A Python framework for fast computation of mathematical ex- pressions. arXiv e-prints, abs/1605.02688.
- Venturelli, M., Borghi, G., Vezzani, R., and Cucchiara, R. (2016). Deep head pose estimation from depth data for in-car automotive applications. In Proceedings of the 2nd International Workshop on Understanding Hu- man Activities through 3D Sensors, ICPR workshop.
- Venturelli, M., Borghi, G., Vezzani, R., and Cucchiara, R. (2017). From depth data to head pose estimation: a siamese approach. In Proceedings of the 12th Inter- national Joint Conference on Computer Vision, Imag- ing and Computer Graphics Theory and Applications (VISAPP).
- Viola, P. and Jones, M. J. (2004). Robust real-time face detection. International journal of computer vision, 57(2):137-154.
- Vu, T.-H., Osokin, A., and Laptev, I. (2015). Context-aware cnns for person head detection. In Proceedings of the IEEE International Conference on Computer Vision, pages 2893-2901.
- Wu, B. and Nevatia, R. (2005). Detection of multiple, par- tially occluded humans in a single image by bayesian combination of edgelet part detectors. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 1, pages 90-97. IEEE.
- Wu, C., Zhang, J., Savarese, S., and Saxena, A. (2015). Watch-n-patch: Unsupervised understanding of ac- tions and relations. In The IEEE Conference on Com- puter Vision and Pattern Recognition (CVPR).
- Xia, L., Chen, C.-C., and Aggarwal, J. K. (2011). Hu- man detection using depth information by kinect. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pages 15-22. IEEE.
- Zhu, X. and Ramanan, D. (2012). Face detection, pose es- timation, and landmark localization in the wild. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 2879-2886. IEEE.