Academia.eduAcademia.edu

Outline

A Cross-media Model for Automatic Image Annotation

https://doi.org/10.1145/2578726.2578728

Abstract

Automatic image annotation is still an important open problem in multimedia and computer vision. The success of media sharing websites has led to the availability of large collections of images tagged with human-provided labels. Many approaches previously proposed in the literature do not accurately capture the intricate dependencies between image content and annotations. We propose a learning procedure based on Kernel Canonical Correlation Analysis which finds a mapping between visual and textual words by projecting them into a latent meaning space. The learned mapping is then used to annotate new images using advanced nearest-neighbor voting methods. We evaluate our approach on three popular datasets, and show clear improvements over several approaches relying on more standard representations.

References (31)

  1. REFERENCES
  2. K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 3:1107-1135, 2003.
  3. G. Carneiro, A. B. Chan, P. J. Moreno, and N. Vasconcelos. Supervised learning of semantic classes for image annotation and retrieval. IEEE TPAMI, 29(3):394-410, 2007.
  4. P. Duygulu, K. Barnard, J. F. G. de Freitas, and D. A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proc. of ECCV, 2002.
  5. S. L. Feng, R. Manmatha, and V. Lavrenko. Multiple bernoulli relevance models for image and video annotation. In Proc. of CVPR, 2004.
  6. H. Fu, Q. Zhang, and G. Qiu. Random forest for image annotation. In Proc. of ECCV, 2012.
  7. Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for internet images, tags, and their semantics. IJCV, in press, 2013.
  8. D. Grangier and S. Bengio. A discriminative kernel- based approach to rank images from text queries. IEEE TPAMI, 30(8):1371-1384, 2008.
  9. M. Grubinger, P. Clough, H. Muller, and T. Deselaers. The IAPR TC-12 benchmark: a new evaluation resource for visual information systems. In Proc. of LRECW, 2006.
  10. M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proc. of ICCV, 2009.
  11. D. R. Hardoon and J. Shawe-Taylor. KCCA for different level precision in content-based image retrieval. In Proc. of IEEE CBMI, 2003.
  12. D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16(12):2639-2664, 2004.
  13. M. J. Huiskes and M. S. Lew. The MIR flickr retrieval evaluation. In Proc. of ACM MIR, 2008.
  14. S. J. Hwang and K. Grauman. Learning the relative importance of objects from tagged images for retrieval and cross-modal search. IJCV, 100(2):134-153, 2012.
  15. V. Lavrenko, R. Manmatha, and J. Jeon. A model for learning the semantics of pictures. In Proc. of NIPS, 2003.
  16. L.-J. Li and L. Fei-Fei. OPTIMOL: Automatic online picture collection via incremental model learning.
  17. IJCV, 88(2):147-168, 2010.
  18. X. Li, C. G. M. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. IEEE TMM, 11(7):1310-1322, 2009.
  19. J. Liu, M. Li, Q. Liu, H. Lu, and S. Ma. Image annotation via graph learning. Pattern Recognition, 42(2):218-228, 2009.
  20. A. Makadia, V. Pavlovic, and S. Kumar. A new baseline for image annotation. In Proc. of ECCV, 2008.
  21. D. Metzler and R. Manmatha. An inference network approach to image retrieval. In Proc. of ACM CIVR, 2004.
  22. F. Monay and D. Gatica-Perez. PLSA-based image auto-annotation: Constraining the latent space. In Proc. of ACM Multimedia, 2004.
  23. N. Rasiwasia, J. Costa Pereira, E. Coviello, G. Doyle, G. R. G. Lanckriet, R. Levy, and N. Vasconcelos. A new approach to cross-modal multimedia retrieval. In Proc. of ACM Multimedia, 2010.
  24. J. B. Tenenbaum, V. De Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.
  25. T. Uricchio, L. Ballan, M. Bertini, and A. Del Bimbo. An evaluation of nearest-neighbor methods for tag refinement. In Proc. of IEEE ICME, 2013.
  26. J. Verbeek, M. Guillaumin, T. Mensink, and C. Schmid. Image annotation with tagprop on the mirflickr set. In Proc. of ACM MIR, 2010.
  27. Y. Verma and C. V. Jawahar. Image annotation using metric learning in semantic neighbourhoods. In Proc. of ECCV, 2012.
  28. Y. Verma and C. V. Jawahar. Exploring svm for image annotation in presence of confusing labels. In Proc. of BMVC, 2013.
  29. A. Yavlinsky, E. Schofield, and S. Rüger. Automated image annotation using global features and robust nonparametric density estimation. In Proc. of ACM CIVR, 2005.
  30. S. Zhang, J. Huang, Y. Huang, Y. Yu, H. Li, and D. N. Metexas. Automatic image annotation using group sparsity. In Proc. of CVPR, 2010.
  31. A. Znaidia, , H. Le Borgne, and C. Hudelot. Tag completion based on belief theory and neighbor voting. In Proc. of ACM ICMR, 2013.