UnseenNet: Fast Training Detector for Any Unseen Concept
2022, ArXiv
https://doi.org/10.48550/ARXIV.2203.08759Abstract
Training of object detection models using less data is currently the focus of existing N-shot learning models in computer vision. Such methods use object-level labels and takes hours to train on unseen classes. There are many cases where we have large amount of image-level labels available for training but cannot be utilized by few shot object detection models for training. There is a need for a machine learning framework that can be used for training any unseen class and can become useful in real-time situations. In this paper, we proposed an “Unseen Class Detector” that can be trained within a very short time for any possible unseen class without bounding boxes with competitive accuracy. We build our approach on “Strong” and “Weak” baseline detectors, which we trained on existing object detection and image classification datasets, respectively. Unseen concepts are fine-tuned on the strong baseline detector using only image-level labels and further adapted by transferring the classifi...
References (48)
- Bilen, H. and Vedaldi, A. Weakly supervised deep detec- tion networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2846- 2854, 2016.
- Bilen, H., Pedersoli, M., and Tuytelaars, T. Weakly su- pervised object detection with posterior regularization. Proceedings BMVC 2014, pp. 1-12, 2014.
- Bilen, H., Pedersoli, M., and Tuytelaars, T. Weakly super- vised object detection with convex clustering. In Proceed- ings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1081-1089, 2015.
- Chen, H., Wang, Y., Wang, G., and Qiao, Y. Lstd: A low-shot transfer detector for object detection. In Pro- ceedings of the AAAI conference on artificial intelligence, volume 32, 2018.
- Cinbis, R. G., Verbeek, J., and Schmid, C. Weakly super- vised object localization with multi-fold multiple instance learning. IEEE transactions on pattern analysis and ma- chine intelligence, 39(1):189-203, 2016.
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 248-255. IEEE, 2009.
- Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88 (2):303-338, June 2010.
- Fei-Fei, L., Fergus, R., and Perona, P. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence, 28(4):594-611, 2006.
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE inter- national conference on computer vision, pp. 1440-1448, 2015.
- Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich fea- ture hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580-587, 2014.
- Gokberk Cinbis, R., Verbeek, J., and Schmid, C. Multi-fold mil training for weakly supervised object localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2409-2416, 2014.
- Hoffman, J. Adaptive learning algorithms for transferable visual recognition. University of California, Berkeley, 2016.
- Hoffman, J., Guadarrama, S., Tzeng, E. S., Hu, R., Donahue, J., Girshick, R., Darrell, T., and Saenko, K. Lsda: Large scale detection through adaptation. In Advances in Neural Information Processing Systems, pp. 3536-3544, 2014.
- Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al. Searching for mobilenetv3. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1314-1324, 2019.
- Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
- Kang, B., Liu, Z., Wang, X., Yu, F., Feng, J., and Darrell, T. Few-shot object detection via feature reweighting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8420-8429, 2019.
- Kolesnikov, A. and Lampert, C. H. Improving weakly- supervised object localization by micro-annotation. arXiv preprint arXiv:1605.05538, 2016.
- Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., Veit, A., et al. Openimages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github. com/openimages, 2:3, 2017.
- Krizhevsky, A., Hinton, G., et al. Learning multiple layers of features from tiny images. 2009.
- Krizhevsky, A., Sutskever, I., and Hinton, G. E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097-1105, 2012.
- LeCun, Y., Cortes, C., and Burges, C. Mnist hand- written digit database. ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist, 2, 2010.
- Li, Y., Zhang, J., Huang, K., and Zhang, J. Mixed supervised object detection with robust objectness transfer. IEEE transactions on pattern analysis and machine intelligence, 41(3):639-653, 2018.
- Li, Y., Zhu, H., Cheng, Y., Wang, W., Teo, C. S., Xiang, C., Vadakkepat, P., and Lee, T. H. Few-shot object detection via classification refinement and distractor retreatment. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15395-15403, 2021.
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra- manan, D., Dollár, P., and Zitnick, C. L. Microsoft coco: Common objects in context. In European conference on computer vision, pp. 740-755. Springer, 2014.
- Lin, T.-Y., Goyal, P., Girshick, R., He, K., and Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE international conference on computer vision, pp. 2980-2988, 2017.
- Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C. Ssd: Single shot multibox detector. In European conference on computer vision, pp. 21-37. Springer, 2016.
- Oquab, M., Bottou, L., Laptev, I., and Sivic, J. Is object localization for free?-weakly-supervised learning with convolutional neural networks. In Proceedings of the IEEE conference on computer vision and pattern recogni- tion, pp. 685-694, 2015.
- Pedersen, T., Patwardhan, S., and Michelizzi, J. Wordnet:: Similarity: measuring the relatedness of concepts. In Demonstration papers at HLT-NAACL 2004, pp. 38-41. Association for Computational Linguistics, 2004.
- Redmon, J. and Farhadi, A. Yolov3: An incremental im- provement. arXiv preprint arXiv:1804.02767, 2018.
- Redmon, J., Divvala, S., Girshick, R., and Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779-788, 2016.
- Ren, S., He, K., Girshick, R., and Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in neural information processing systems, pp. 91-99, 2015.
- Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211-252, 2015. doi: 10.1007/s11263-015-0816-y.
- Shi, Z., Siva, P., and Xiang, T. Transfer learning by ranking for weakly supervised object annotation. arXiv preprint arXiv:1705.00873, 2017.
- Simonyan, K. and Zisserman, A. Very deep convolu- tional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- Sun, B., Li, B., Cai, S., Yuan, Y., and Zhang, C. Fsce: Few- shot object detection via contrastive proposal encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7352-7362, 2021.
- Tang, P., Wang, X., Bai, X., and Liu, W. Multiple instance detection network with online instance classifier refine- ment. In Proceedings of the IEEE Conference on Com- puter Vision and Pattern Recognition, pp. 2843-2851, 2017a.
- Tang, P., Wang, X., Wang, A., Yan, Y., Liu, W., Huang, J., and Yuille, A. Weakly supervised region proposal network and object detection. In Proceedings of the European conference on computer vision (ECCV), pp. 352-368, 2018.
- Tang, Y., Wang, J., Gao, B., Dellandréa, E., Gaizauskas, R., and Chen, L. Large scale semi-supervised object detection using visual and semantic knowledge transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2119-2128, 2016.
- Tang, Y., Wang, J., Wang, X., Gao, B., Dellandréa, E., Gaizauskas, R., and Chen, L. Visual and semantic knowl- edge transfer for large scale semi-supervised object detec- tion. IEEE transactions on pattern analysis and machine intelligence, 40(12):3045-3058, 2017b.
- Uijlings, J., Popov, S., and Ferrari, V. Revisiting knowledge transfer for training object class detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1101-1110, 2018.
- Wang, C., Huang, K., Ren, W., Zhang, J., and Maybank, S. Large-scale weakly supervised object localization via latent category learning. IEEE Transactions on Image Processing, 24(4):1371-1385, 2015.
- Wang, X., Huang, T. E., Darrell, T., Gonzalez, J. E., and Yu, F. Frustratingly simple few-shot object detection. International Conference on Machine Learning (ICML), 2020.
- Wang, Y.-X., Ramanan, D., and Hebert, M. Meta-learning to detect rare objects. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9925- 9934, 2019.
- Wu, J., Liu, S., Huang, D., and Wang, Y. Multi-scale posi- tive sample refinement for few-shot object detection. In European Conference on Computer Vision, pp. 456-472. Springer, 2020.
- Yan, X., Chen, Z., Xu, A., Wang, X., Liang, X., and Lin, L. Meta r-cnn: Towards general solver for instance-level low-shot learning. In Proceedings of the IEEE/CVF Inter- national Conference on Computer Vision, pp. 9577-9586, 2019.
- Zeng, Z., Liu, B., Fu, J., Chao, H., and Zhang, L. Wsod2: Learning bottom-up and top-down objectness distillation for weakly-supervised object detection. In Proceedings of the IEEE International Conference on Computer Vision, pp. 8292-8300, 2019.
- Zheng, Y. and Cui, L. Zero-shot object detection with transformers. In 2021 IEEE International Conference on Image Processing (ICIP), pp. 444-448. IEEE, 2021.
- Zhu, Y., Zhou, Y., Ye, Q., Qiu, Q., and Jiao, J. Soft proposal networks for weakly supervised object localization. In Proceedings of the IEEE International Conference on Computer Vision, pp. 1841-1850, 2017.