Skin feature point tracking using deep feature encodings
2021, arXiv (Cornell University)
https://doi.org/10.48550/ARXIV.2112.14159Abstract
Facial feature tracking is a key component of imaging ballistocardiography (BCG) where accurate quantification of the displacement of facial keypoints is needed for good heart rate estimation. Skin feature tracking enables video-based quantification of motor degradation in Parkinson's disease. Traditional computer vision algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Lucas-Kanade method (LK). These have long represented the state-of-the-art in efficiency and accuracy but fail when common deformations, like affine local transformations or illumination changes, are present. Over the past five years, deep convolutional neural networks have outperformed traditional methods for most computer vision tasks. We propose a pipeline for feature tracking, that applies a convolutional stacked autoencoder to identify the most similar crop in an image to a reference crop containing the feature of interest. The autoencoder learns to represent image crops into deep feature encodings specific to the object category it is trained on. We train the autoencoder on facial images and validate its ability to track skin features in general using manually labeled face and hand videos. The tracking errors of distinctive skin features (moles) are so small that we cannot exclude that they stem from the manual labelling based on a χ 2 -test. With a mean error of 0.6-4.2 pixels, our method outperformed the other methods in all but one scenario. More importantly, our method was the only one to not diverge. We conclude that our method creates better feature descriptors for feature tracking, feature matching, and image registration than the traditional algorithms.
References (96)
- Milton Abramowitz and Irene A Stegun. Handbook of mathe- matical functions with formulas, graphs, and mathematical tables, 1988.
- Yassine Ahmine, Guillaume Caron, El Mustapha Mouaddib, and Fatima Chouireb. Adaptive lucas-kanade tracking. Im- age and Vis. Comput., 88:1-8, 2019.
- Haldun Akoglu. User's guide to correlation coefficients. Turk- ish Journal of Emergency Medicine, 18(3):91-93, 2018.
- Hesham A Alberry, Abdelfatah A Hegazy, and Gouda I Salama. A fast sift based method for copy move forgery detection. Future Comput. and Inform. J., 3(2):159-165, 2018.
- Matthew Anderson, Ricardo Motta, Srinivasan Chandrasekar, and Michael Stokes. Proposal for a standard default color space for the internet-srgb. In Color and imaging confer- ence, volume 1996, pages 238-245. Society for Imaging Sci- ence and Technology, 1996.
- Sadaf Ansari. A review on sift and surf for underwater image feature detection and matching. In 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), pages 1-4. IEEE, 2019.
- Akram Ashyani, Chi-Lun Lin, Esteban Roman, Ted Yeh, Tachyon Kuo, Wei-Fang Tsai, Yushan Lin, Ric Tu, Austin Su, Chien-Chih Wang, Chun-Hsiang Tan, and Torbjörn E M Nordling. Digitization of updrs upper limb motor exami- nations towards automated quantification of symptoms of parkinson's disease. Manuscript in preparation, 2022.
- Simon Baker, Daniel Scharstein, JP Lewis, Stefan Roth, Michael J Black, and Richard Szeliski. A database and evaluation methodology for optical flow. Int. J. of Com- put. Vis., 92(1):1-31, 2011.
- Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In European conference on computer vision, pages 404-417. Springer, 2006.
- Jeffrey S Beis and David G Lowe. Shape indexing using approx- imate nearest-neighbour search in high-dimensional spaces. In Proceedings of IEEE computer society conference on computer vision and pattern recognition, pages 1000-1006. IEEE, 1997.
- Fangming Bi, Xin Ma, Wei Chen, Weidong Fang, Huayi Chen, Jingru Li, and Biruk Assefa. Review on video object track- ing based on deep learning. J. of New Media, 1(2):63, 2019.
- Alexey Bochkovskiy, Chien-Yao Wang, and Hong-Yuan Liao. Yolov4: Optimal speed and accuracy of object detection. arXiv preprint, 2020.
- Jean-Yves Bouguet. Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation, 5(1-10):4, 2001.
- Daniel J Butler, Jonas Wulff, Garrett B Stanley, and Michael J Black. A naturalistic open source movie for optical flow eval- uation. In European conference on computer vision, pages 611-625. Springer, 2012.
- Che-Han Chang, Chun-Nan Chou, and Edward Y Chang. Clkn: Cascaded lucas-kanade networks for image alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2213-2221, 2017.
- Chien-Chang Chen, Wei-Yu Lu, and Chung-Hsuan Chou. Ro- tational copy-move forgery detection using sift and region growing strategies. Multimed. Tools and Appl., 78(13): 18293-18308, 2019.
- Chun-Hong Cheng, Kwan-Long Wong, Jing-Wei Chin, Tsz-Tai Chan, and Richard HY So. Deep learning methods for re- mote heart rate measurement: a review and future research agenda. Sensors, 21(18):6296, 2021a.
- Yihua Cheng, Haofei Wang, Yiwei Bao, and Feng Lu. Appearance-based gaze estimation with deep learning: A review and benchmark. arXiv preprint arXiv:2104.12668, 2021b.
- Hsiang-Jen Chien, Chen-Chi Chuang, Chia-Yen Chen, and Reinhard Klette. When to use what feature? sift, surf, orb, or a-kaze features for monocular visual odometry. In 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ), pages 1-6. IEEE, 2016.
- Lovish Chum, Anbumani Subramanian, Vineeth N Balasubra- manian, and CV Jawahar. Beyond supervised learning: a computer vision perspective. J. of the Indian Inst. of Sci., 99(2):177-199, 2019.
- Joon Son Chung, Andrew Senior, Oriol Vinyals, and Andrew Zisserman. Lip reading sentences in the wild. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3444-3453. IEEE, 2017.
- Gioele Ciaparrone, Francisco Luque Sánchez, Siham Tabik, Luigi Troiano, Roberto Tagliaferri, and Francisco Herrera. Deep learning in video multi-object tracking: A survey. Neurocomputing, 381:61-88, 2020.
- Philippe Colantoni, Jean-Baptiste Thomas, and Alain Trémeau. Sampling cielab color space with perceptual met- rics. International Journal of Imaging and Robotics, 16(3): 1-22, 2016.
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-fcn: Object detection via region-based fully convolutional networks. In Advances in neural information processing systems, pages 379-387, 2016.
- Alexey Dosovitskiy, Philipp Fischer, Eddy Ilg, Philip Hausser, Caner Hazirbas, Vladimir Golkov, Patrick Van Der Smagt, Daniel Cremers, and Thomas Brox. Flownet: Learning opti- cal flow with convolutional networks. In Proceedings of the IEEE international conference on computer vision, pages 2758-2766, 2015.
- Youssef Douini, Jamal Riffi, Adnane Mohamed Mahraz, and Hamid Tairi. An image registration algorithm based on phase correlation and the classical lucas-kanade technique. Signal, Image and Video Process., 11(7):1321-1328, 2017a.
- Youssef Douini, Jamal Riffi, Mohamed Adnane Mahraz, and Hamid Tairi. Solving sub-pixel image registration prob- lems using phase correlation and lucas-kanade optical flow method. In 2017 Intelligent Systems and Computer Vision (ISCV), pages 1-5. IEEE, 2017b.
- Iyad Abu Doush and AL-Btoush Sahar. Currency recognition using a smartphone: Comparison between color sift and gray scale sift algorithms. J. of King Saud University-Computer and Inf. Sci., 29(4):484-492, 2017.
- Vincent Dumoulin and Francesco Visin. A guide to convolution arithmetic for deep learning. arXiv preprint, 2018.
- Guangyu Gao, Liling Liu, Li Wang, and Yihang Zhang. Fash- ion clothes matching scheme based on siamese network and autoencoder. Multimed. Syst., 25(6):593-602, 2019.
- I Garcia, Sebastian Bronte, Luis Miguel Bergasa, Javier Al- mazán, and J Yebes. Vision-based drowsiness detector for real driving conditions. In 2012 IEEE Intelligent Vehicles Symposium, pages 618-623. IEEE, 2012.
- Andreas Geiger, Philip Lenz, Christoph Stiller, and Raquel Urtasun. Vision meets robotics: The kitti dataset. The International Journal of Robotics Research, 32(11):1231- 1237, 2013.
- Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, Cambridge, MA, U.S.A., 2016. ISBN 978-0262035613. URL http://www.deeplearningbook.org.
- Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Lar- lus. Deep image retrieval: Learning global representations for image search. In European conference on computer vi- sion, pages 241-257. Springer, 2016.
- Xifeng Guo, Xinwang Liu, En Zhu, and Jianping Yin. Deep clustering with convolutional autoencoders. In Derong Liu, Shengli Xie, Yuanqing Li, Dongbin Zhao, and El-Sayed M. El-Alfy, editors, Neural Information Processing, pages 373- 382, Cham, 2017. Springer International Publishing. ISBN 978-3-319-70096-0.
- Amir HajiRassouliha, Andrew J Taberner, Martyn P Nash, and Poul MF Nielsen. Subpixel phase-based image reg- istration using savitzky-golay differentiators in gradient- correlation. Comput. Vis. and Image Underst., 170:28-39, 2018.
- M. A. Hassan, A. S. Malik, D. Fofi, N. Saad, B. Karasfi, Y. S. Ali, and F. Meriaudeau. Heart rate estimation using facial video: A review. Biomed. Signal Proces. and Control., 38: 346-360, 2017. ISSN 17468108.
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770-778, June 2016. doi: 10.1109/CVPR.2016.90.
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961-2969, 2017a.
- Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Gir- shick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961-2969, 2017b.
- G.E. Hinton and R. R. Salakhutdinov. Reducing the dimen- sionality of data with neural networks. Sci., 313(July):504- 507, 2006.
- Eric B Holmgren. The pp plot as a method for comparing treatment effects. J. of the Am. Stat. Assoc., 90(429):360- 365, 1995.
- Berthold KP Horn and Brian G Schunck. Determining optical flow. In Techniques and Applications of Image Understand- ing, volume 281, pages 319-331. International Society for Optics and Photonics, 1981.
- Borui Hou and Ruqiang Yan. Convolutional autoencoder model for finger-vein verification. IEEE Transactions on Instrumentation and Measurement, 69(5):2067-2074, 2019.
- Abdullah Ayub Khan, Asif Ali Laghari, and Shafique Ahmed Awan. Machine learning in computer vision: A review. EAI Trans. on Scalable Inf. Syst., page e4, 2021.
- Nabeel Khan, Brendan McCane, and Steven Mills. Better than sift? Mach. Vis. and Appl., 26(6):819-836, 2015.
- Diederik P Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv preprint, pages 1-15, 2014. ISSN 09252312.
- Vladimir A Knyaz, Oleg Vygolov, Vladimir V Kniaz, Yury Vizilter, Vladimir Gorbatsevich, Thomas Luhmann, and Niklas Conen. Deep learning of convolutional auto-encoder for image matching and 3d object reconstruction in the in- frared range. In Proceedings of the IEEE International Con- ference on Computer Vision Workshops, pages 2155-2164, 2017.
- H Law and J Deng. Detecting objects as paired keypoints. Lect. Notes in Comput. Sci., pages 765-781, 2018.
- Yann LeCun, Koray Kavukcuoglu, and Clément Farabet. Con- volutional networks and applications in vision. In Proceed- ings of 2010 IEEE international symposium on circuits and systems, pages 253-256. IEEE, 2010.
- Wookey Lee, Jessica Jiwon Seong, Busra Ozlu, Bong Sup Shim, Azizbek Marakhimov, and Suan Lee. Biosignal sensors and deep learning-based speech recognition: A review. Sensors, 21(4):1399, 2021.
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European conference on computer vision, pages 740-755. Springer, 2014.
- Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature pyra- mid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 2117-2125, 2017a.
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal loss for dense object detection. In Pro- ceedings of the IEEE international conference on computer vision, pages 2980-2988, 2017b.
- Li Liu, Wanli Ouyang, Xiaogang Wang, Paul Fieguth, Jie Chen, Xinwang Liu, and Matti Pietikäinen. Deep learn- ing for generic object detection: A survey. International Journal of Computer Vision, 128(2):261-318, 2020.
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In European con- ference on computer vision, pages 21-37. Springer, 2016.
- David G. Lowe. Distinctive image features from scale-invariant keypoints. Int. J. of Comp. Vis., 60(2):91-110, 2004. ISSN 09205691.
- Bruce Lucas and Takeo Kanade. An iterative image regis- tration technique with an application to stereo vision. In IJCAI, volume 81, 04 1981.
- Ilja Manakov, Markus Rohm, and Volker Tresp. Walking the tightrope: an investigation of the convolutional autoencoder bottleneck. arXiv preprint, 2019.
- Francesca Manni, Fons van der Sommen, Svitlana Zinger, Caifeng Shan, Ronald Holthuizen, Marco Lai, Gustav Buström, Richelle JM Hoveling, Erik Edström, Adrian Elmi-Terander, et al. Hyperspectral imaging for skin fea- ture detection: Advances in markerless tracking for spine surgery. Applied Sciences, 10(12):4078, 2020.
- K McLaren. Xiii-the development of the cie 1976 (l* a* b*) uniform colour space and colour-difference formula. J. of the Soc. of Dyers and Colour., 92(9):338-341, 1976.
- Krystian Mikolajczyk and Cordelia Schmid. A performance evaluation of local descriptors. IEEE transactions on pat- tern analysis and machine intelligence, 27(10):1615-1630, 2005.
- Hartmut Neven, Geordie Rose, and William G Macready. Im- age recognition with an adiabatic quantum computer i. map- ping to quadratic unconstrained binary optimization. arXiv preprint arXiv:0804.4457, 2008.
- Aoxin Ni, Arian Azarang, and Nasser Kehtarnavaz. A review of deep learning-based contactless heart rate measurement methods. Sensors, 21(11):3719, 2021.
- Hyeonwoo Noh, Andre Araujo, Jack Sim, Tobias Weyand, and Bohyung Han. Large-scale image retrieval with atten- tive deep local features. In Proceedings of the IEEE inter- national conference on computer vision, pages 3456-3465, 2017.
- Xiaorong Pu, Ke Fan, Xiong Chen, Luping Ji, and Zhihu Zhou. Facial expression recognition from image sequences using twofold random forest classifier. Neurocomputing, 168:1173- 1180, 2015.
- Joseph Redmon and Ali Farhadi. Yolo9000: better, faster, stronger. In Proceedings of the IEEE conference on com- puter vision and pattern recognition, pages 7263-7271, 2017.
- Joseph Redmon and Ali Farhadi. Yolov3: An incremental im- provement. arXiv preprint, 2018.
- Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779-788, 2016.
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with re- gion proposal networks. In Advances in neural information processing systems, pages 91-99, 2015.
- Rasmus Rothe, Radu Timofte, and Luc Van Gool. Deep expec- tation of real and apparent age from a single image without facial landmarks. Int. J. of Comp. Vis., 126(2-4):144-157, 2018.
- Paul-Edouard Sarlin, Daniel DeTone, Tomasz Malisiewicz, and Andrew Rabinovich. Superglue: Learning feature match- ing with graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4938-4947, 2020.
- Gulbadan Sikander and Shahzad Anwar. Driver fatigue detec- tion systems: A review. IEEE Transactions on Intelligent Transportation Systems, 20(6):2339-2352, 2018.
- Edgar Simo-Serra, Eduard Trulls, Luis Ferraz, Iasonas Kokki- nos, Pascal Fua, and Francesc Moreno-Noguer. Discrimi- native learning of deep convolutional feature point descrip- tors. In Proceedings of the IEEE international conference on computer vision, pages 118-126, 2015.
- Marzuraikah Mohd Stofa, Mohd Asyraf Zulkifley, and Muham- mad Ammirrul Atiqi Mohd Zainuri. Skin lesions classifica- tion and segmentation: A review. International Journal of Advanced Computer Science and Applications, 12(10), 2021.
- Peifeng Su, Daizhi Liu, Xihai Li, and Zhigang Liu. A saliency- based band selection approach for hyperspectral imagery inspired by scale selection. IEEE Geoscience and Remote Sensing Letters, 15(4):572-576, 2018.
- Jiaming Sun, Zehong Shen, Yuang Wang, Hujun Bao, and Xi- aowei Zhou. Loftr: Detector-free local feature matching with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8922- 8931, 2021.
- Christian Szegedy, Liu Wei, Jia Yangqing, Sermanet Pierre, Reed Scott, Anguelov Dragomir, Erham Dumitru, Van- houcke Vincent, and Rabinovich Andrew. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. Popul. Health Manag., 18(3):186-191, 2015. ISSN 1942-7891. doi: 10.1089/pop.2014.0089. URL http: //online.liebertpub.com/doi/10.1089/pop.2014.0089.
- Shaharyar Ahmed Khan Tareen and Zahra Saleem. A compar- ative analysis of sift, surf, kaze, akaze, orb, and brisk. In 2018 International conference on computing, mathematics and engineering technologies (iCoMET), pages 1-10. IEEE, 2018.
- Jose Miguel Tarongi and Adriano Camps. Normality analysis for rfi detection in microwave radiometry. Remote Sensing, 2(1):191-210, 2010.
- George Brinton Thomas and Ross L Finney. Calculus And Analytic Geometry. Addison-Wesley Publishing Company, 1900 E Lake Ave Glenview, IL 60025 United States, 1961.
- Quoc-Viet Tran, Shun-Feng Su, and Van-Truong Nguyen. Pyramidal lucas-kanade-based noncontact breath motion detection. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 50(7):2659-2670, 2018.
- Marco C. Uchida, Renato Carvalho, Vitor Daniel Tessutti, Reury Frank Pereira Bacurau, Hélio José Coelho-Júnior, Luciane Portas Capelo, Heloiza Prando Ramos, Marcia Cal- ixto dos Santos, Luís Felipe Milano Teixeira, and Paulo Hen- rique Marchetti. Identification of muscle fatigue by tracking facial expressions. PLoS ONE, 13(12):1-11, 2018. ISSN 19326203.
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkor- eit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, pages 5998-6008, 2017.
- Chien-Chih Wang. Non-contact heart rate measurement based on facial videos. Master's thesis, National Cheng Kung Uni- versity, No. 1, Dasyue Rd, East District, Tainan City, 701, 2020.
- Nannan Wang, Xinbo Gao, Dacheng Tao, Heng Yang, and Xuelong Li. Facial feature point detection: A comprehensive survey. Neurocomputing, 275:50-65, 2018. ISSN 18728286.
- Philippe Weinzaepfel, Jerome Revaud, Zaid Harchaoui, and Cordelia Schmid. Deepflow: Large displacement optical flow with deep matching. In Proceedings of the IEEE inter- national conference on computer vision, pages 1385-1392, 2013.
- Xue Iuan Wong and Manoranjan Majji. Uncertainty quantifi- cation of lucas kanade feature track and application to visual odometry. In 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 950- 958. IEEE, 2017.
- Jure Zbontar and Yann LeCun. Stereo matching by training a convolutional neural network to compare image patches. J. of Mach. Learn. Res., 17(1):2287-2318, 2016.
- Matthew D Zeiler, Dilip Krishnan, Graham W Taylor, and Rob Fergus. Deconvolutional networks. In 2010 IEEE Computer Society Conference on computer vision and pattern recog- nition, pages 2528-2535. IEEE, 2010.
- Zhifei Zhang, Yang Song, and Hairong Qi. Age progres- sion/regression by conditional adversarial autoencoder. In IEEE Conference on Computer Vision and Pattern Recog- nition (CVPR). IEEE, 2017.
- Amy Zhao, Frédo Durand, and John Guttag. Estimating a small signal in the presence of large noise. In Proceed- ings of the IEEE International Conference on Computer Vision, volume 2016-Febru, pages 671-676, 2016. ISBN 9781467383905.
- Jianqing Zhao, Xiaohu Zhang, Chenxi Gao, Xiaolei Qiu, Yongchao Tian, Yan Zhu, and Weixing Cao. Rapid mo- saicking of unmanned aerial vehicle (uav) images for crop growth monitoring using the sift algorithm. Remote. Sens., 11(10):1226, 2019.
- Liang Zheng, Yi Yang, and Qi Tian. Sift meets cnn: A decade survey of instance retrieval. IEEE transactions on pattern analysis and machine intelligence, 40(5):1224-1244, 2017.
- Yan-Tao Zheng, Ming Zhao, Yang Song, Hartwig Adam, Ulrich Buddemeier, Alessandro Bissacco, Fernando Brucher, Tat- Seng Chua, and Hartmut Neven. Tour the world: building a web-scale landmark recognition engine. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, pages 1085-1092. IEEE, 2009.
- Xingyi Zhou, Dequan Wang, and Philipp Krähenbühl. Objects as points. arXiv preprint arXiv:1904.07850, 2019.