Academia.eduAcademia.edu

Outline

Large-scale image annotation using visual synset

2011, 2011 International Conference on Computer Vision

https://doi.org/10.1109/ICCV.2011.6126295

Abstract

We address the problem of large-scale annotation of web images. Our approach is based on the concept of visual synset, which is an organization of images which are visually-similar and semantically-related. Each visual synset represents a single prototypical visual concept, and has an associated set of weighted annotations. Linear SVM's are utilized to predict the visual synset membership for unseen image examples, and a weighted voting rule is used to construct a ranked list of predicted annotations from a set of visual synsets. We demonstrate that visual synsets lead to better performance than standard methods on a new annotation database containing more than 200 million images and 300 thousand annotations, which is the largest ever reported.

References (25)

  1. K. Barnard, P. Duygulu, D. Forsyth, N. Freitas, D. Blei, and M. Jordan. Matching words and pictures. JMLR, 2003. 2
  2. W. Bi and J. T. Kwok. Multi-label classification on tree-and dag-structured hierarchies. ICML, 2011. 2
  3. N. Cesa-Bianchi, C. Gentile, and L. Zaniboni. Incremental algorithms for hierarchical classification. JMLR, 2006. 2
  4. P. Ciaccia, M. Patella, and P. Zezula. M-tree: An efficient access method for similarity search in metric spaces. VLDB, 1997. 6
  5. J. Deng, A. Berg, K. Li, and F. fei Li. What does classifying more than 10,000 image categories tell us? ECCV, 2010. 1, 3
  6. J. Deng, W. Dong, R. Socher, L. Li, K. Li, and F.-F. Li. Ima- genet: A large hierarchical image database. CVPR, 2009. 1, 2, 3
  7. T. Deselaers and V. Ferrari. Visual and semantic similarity in imagenet. CVPR, 2011. 5
  8. A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth. Describing objects by their attributes. CVPR, 2009. 2
  9. B. J. Frey and D. Dueck. Clustering by passing messages between data points. Science, 2007. 3
  10. A. Frome, Y. Singer, F. Sha, and J. Malik. Learning globally- consistent local distance functions for shape-based image re- trieval and classification. ICCV, 2007. 3
  11. M. Guillaumin, T. Mensink, J. Verbeek, and C. Schmid. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. ICCV, 2009. 7
  12. M. J. Huiskes and M. S. Lew. The mir flickr retrieval evalu- ation. MIR, 2008. 7
  13. Y. Jing, M. Covell, and H. Rowley. Comparison of cluster- ing approaches for summarizing large population of images. ICME VCIDS, 2010. 6
  14. Y. Jing, H. Rowley, C. Rosenberg, J. Wang, and M. Covell. Visualizing web images via google image swirl. NIPS work- shop on Statistical Machine Learning for Visual Analytics, 2009. 1, 3
  15. T. Liu, A. Moore, A. Gray, and K. Yang. An investigation of practical approximate nearest neighbor algorithms. NIPS, 2004. 6
  16. A. Makadia, V. Pavlovic, and S. Kumar. Baselines for image annotation. IJCV, 2010. 2
  17. F. Monay and D. Gatica-Perez. Plsa-based image auto- annotation: Constrining the latent space. ACM Multimedia, 2004. 2
  18. A. Ng, M. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. NIPS, 2002. 6
  19. S. Ponzetto and M. Strube. Deriving a large scale taxonomy from wikipedia. AAAI, 2007. 7
  20. S. Shalev-Schwartz, Y. Singer, and N. Srebro. Pegasos: pri- mal estimated sub-gradient solver for svm. ICML, 2007. 4
  21. A. Torralba, R. Fergus, and W. Freeman. 80 million tiny images: a large dataset for non-parametric object and scene recognition. PAMI, 2008. 1, 4
  22. X. Wang, L. Zhang, M. Liu, Y. Li, and W. Ma. Arista -image search to annotation on billions of web photos. CVPR, 2010. 1, 2
  23. J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embed- dings. Machine Learning Journal, 2010. 2, 5
  24. O. Yakhnenko and V. Honavar. Annotating images and image objects using a hierarchical dirichlet process model. MDM, 2008. 2
  25. Y.-T. Zheng, M. Zhao, S.-Y. Neo, T.-S. Chua, and Q. Tian. Visual synset: Towards a higher-level visual representation. CVPR, 2008. 1