Snap and Find: Deep Discrete Cross-domain Garment Image Retrieval
2019, ArXiv
Abstract
With the increasing number of online stores, there is a pressing need for intelligent search systems to understand the item photos snapped by customers and search against large-scale product databases to find their desired items. However, it is challenging for conventional retrieval systems to match up the item photos captured by customers and the ones officially released by stores, especially for garment images. To bridge the customer- and store- provided garment photos, existing studies have been widely exploiting the clothing attributes (\textit{e.g.,} black) and landmarks (\textit{e.g.,} collar) to learn a common embedding space for garment representations. Unfortunately they omit the sequential correlation of attributes and consume large quantity of human labors to label the landmarks. In this paper, we propose a deep multi-task cross-domain hashing termed \textit{DMCH}, in which cross-domain embedding and sequential attribute learning are modeled simultaneously. Sequential att...
References (38)
- Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, "Deepfashion: Powering robust clothes recognition and retrieval with rich annotations," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016, pp. 1096-1104.
- H. Zhan, B. Shi, and A. C. Kot, "Cross-domain shoe retrieval with a semantic hierarchy of attribute classification network," IEEE Trans. Image Processing, vol. 26, no. 12, pp. 5867-5881, 2017.
- X. Ji, W. Wang, M. Zhang, and Y. Yang, "Cross-domain image retrieval with attention modeling," in Proceedings of the 2017 ACM on Multime- dia Conference, MM 2017, Mountain View, CA, USA, October 23-27, 2017, 2017, pp. 1654-1662.
- X. Han, Z. Wu, P. X. Huang, X. Zhang, M. Zhu, Y. Li, Y. Zhao, and L. S. Davis, "Automatic spatially-aware fashion concept discovery," in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, 2017, pp. 1472-1480.
- J. Huang, R. S. Feris, Q. Chen, and S. Yan, "Cross-domain image retrieval with a dual attribute-aware ranking network," in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, 2015, pp. 1062-1070.
- H. Liu, R. Wang, S. Shan, and X. Chen, "Learning multifunctional binary codes for both category and attribute oriented retrieval tasks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 6259- 6268.
- J. Wang, T. Zhang, J. Song, N. Sebe, and H. T. Shen, "A survey on learning to hash," IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 769-790, 2018.
- B. Wang, Y. Yang, X. Xu, A. Hanjalic, and H. T. Shen, "Adversarial cross-modal retrieval," in Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23-27, 2017, 2017, pp. 154-162.
- Y. Gong, S. Lazebnik, A. Gordo, and F. Perronnin, "Iterative quantiza- tion: A procrustean approach to learning binary codes for large-scale image retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 12, pp. 2916-2929, 2013.
- F. Shen, C. Shen, W. Liu, and H. T. Shen, "Supervised discrete hashing," in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, June 7-12, 2015, 2015, pp. 37-45.
- W. Liu, J. Wang, R. Ji, Y. Jiang, and S. Chang, "Supervised hashing with kernels," in 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, June 16-21, 2012, 2012, pp. 2074- 2081.
- Y. Luo, Y. Yang, F. Shen, Z. Huang, P. Zhou, and H. T. Shen, "Robust discrete code modeling for supervised hashing," Pattern Recognition, vol. 75, pp. 128-135, 2018.
- Y. Yang, Y. Luo, W. Chen, F. Shen, J. Shao, and H. T. Shen, "Zero- shot hashing via transferring supervised knowledge," in Proceedings of the 2016 ACM Conference on Multimedia Conference, MM 2016, Amsterdam, The Netherlands, October 15-19, 2016, 2016, pp. 1286- 1295.
- W. Li, S. Wang, and W. Kang, "Feature learning based deep supervised hashing with pairwise labels," in Proceedings of the Twenty-Fifth Inter- national Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9-15 July 2016, 2016, pp. 1711-1717.
- H. Liu, R. Wang, S. Shan, and X. Chen, "Learning multifunctional binary codes for both category and attribute oriented retrieval tasks," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 6259- 6268.
- F. Shen, X. Gao, L. Liu, Y. Yang, and H. T. Shen, "Deep asymmetric pairwise hashing," in Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23-27, 2017, 2017, pp. 1522-1530.
- M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg, "Where to buy it: Matching street clothing photos in online shops," in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, 2015, pp. 3343-3351. [Online]. Available: https://doi.org/10.1109/ICCV.2015.382
- C. Su, S. Zhang, F. Yang, G. Zhang, Q. Tian, W. Gao, and L. S. Davis, "Attributes driven tracklet-to-tracklet person re-identification using latent prototypes space mapping," Pattern Recognition, vol. 66, pp. 4-15, 2017.
- C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian, "Deep attributes driven multi-camera person re-identification," in Computer Vision -ECCV 2016 -14th European Conference, Amsterdam, The Netherlands, October 11- 14, 2016, Proceedings, Part II, 2016, pp. 475-491.
- Y. Pan, T. Yao, H. Li, and T. Mei, "Video captioning with transferred semantic attributes," in 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, 2017, pp. 984-992.
- T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei, "Boosting image captioning with attributes," in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, 2017, pp. 4904-4912.
- Y. Bin, Y. Yang, J. Zhou, Z. Huang, and H. T. Shen, "Adaptively attending to visual attributes and linguistic knowledge for captioning," in Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23-27, 2017, 2017, pp. 1345-1353.
- J. Li, Y. Wei, X. Liang, F. Zhao, J. Li, T. Xu, and J. Feng, "Deep attribute-preserving metric learning for natural language object re- trieval," in Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23-27, 2017, 2017, pp. 181-189.
- Y. Li, R. Wang, H. Liu, H. Jiang, S. Shan, and X. Chen, "Two birds, one stone: Jointly learning binary code for large-scale face image retrieval and attributes prediction," in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, 2015, pp. 3819-3827.
- J. Chen, C. Ngo, and T. Chua, "Cross-modal recipe retrieval with rich food attributes," in Proceedings of the 2017 ACM on Multimedia Conference, MM 2017, Mountain View, CA, USA, October 23-27, 2017, 2017, pp. 1771-1779.
- K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention," in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 2015, pp. 2048-2057.
- G. Lu, Y. Yan, L. Ren, J. Song, N. Sebe, and C. Kambhamettu, "Localize me anywhere, anytime: A multi-task point-retrieval approach," in 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, 2015, pp. 2434-2442.
- B. Bhattarai, G. Sharma, and F. Jurie, "Cp-mtml: Coupled projection multi-task metric learning for large scale face retrieval," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016, pp. 4226-4235.
- L. Chen and T. T. Rogers, "Knowing where to look: Conceptual knowl- edge guides fixation in an object categorization task," in Proceedings of the 34th Annual Meeting of the Cognitive Science Society, CogSci 2012, Sapporo, Japan, August 1-4, 2012, 2012.
- K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, 2016, pp. 770-778.
- S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, 2015, pp. 448-456.
- N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: a simple way to prevent neural networks from overfitting," Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
- Z. Cao, M. Long, J. Wang, and P. S. Yu, "Hashnet: Deep learning to hash by continuation," in IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22-29, 2017, 2017, pp. 5609-5618.
- J. Song, T. He, L. Gao, X. Xu, A. Hanjalic, and H. T. Shen, "Binary generative adversarial networks for image retrieval," 2018. [Online]. Available: http://arxiv.org/abs/1708.04150
- C. Lin and F. J. Och, "Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics," in Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 21-26 July, 2004, Barcelona, Spain., 2004, pp. 605-612.
- L. Du, T. Wo, R. Yang, and C. Hu, "Cider: a rapid docker container deployment system through sharing network storage," in 19th IEEE International Conference on High Performance Computing and Communications; 15th IEEE International Conference on Smart City; 3rd IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2017, Bangkok, Thailand, December 18-20, 2017, 2017, pp. 332-339. [Online]. Available: https://doi.org/10.1109/HPCC-SmartCity-DSS.2017.44
- M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
- L. van der Maaten, "Accelerating t-sne using tree-based algorithms," Journal of Machine Learning Research, vol. 15, no. 1, pp. 3221-3245, 2014. Yadan Luo received the B.S. degree in computer science from the University of Electronic Engi- neering and Technology of China in 2017, and is currently working toward the Ph.D. degree at the University of Queensland. Her research interests include multimedia retrieval, machine learning and computer vision. Ziwei Wang received his BSc degree from Beijing University of Civil Engineering and Architecture in 2014 and his Master degree of Computer Science from The University of Queensland, Australia in 2016. He is currently a PhD candidate at The Uni- versity of Queensland. His research interests include image captioning and machine learning. Zi Huang is an ARC Future Fellow in School of ITEE, The University of Queensland. She received her BSc degree from Department of Computer Sci- ence, Tsinghua University, China, and her PhD in Computer Science from School of ITEE, The Uni- versity of Queensland. Dr. Huang's research interests mainly include multimedia indexing and search, social data analysis and knowledge discovery. Yang Yang received the bachelors degree from Jilin University in 2006, the masters degree from Peking University in 2009, and the Ph.D. degree from The University of Queensland, Australia, in 2012, under the supervision of Prof. H. T. Shen and Prof. X. Zhou. He was a Research Fellow under the supervision of Prof. T.-S. Chua with the National University of Singapore from 2012 to 2014. He is currently with the University of Electronic Science and Technology of China. Huimin Lu Huimin Lu received double M.S. de- grees in Electrical Engineering from Kyushu In- stitute of Technology and Yangzhou University in 2011. He received a Ph.D. degree in Electrical Engineering from Kyushu Institute of Technology in 2014. From 2013 to 2016, he was a JSPS research fellow (DC2, PD, and FPD) at Kyushu Institute of Technology. Currently, he is an assistant professor in Kyushu Institute of Technology and an Excellent Young Researcher of Ministry of Education, Culture, Sports, Science and Technology-Japan. His research interests include artificial intelligence, computer vision, computational imag- ing, deep-sea observing, internet of things and robotics.