A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task
Sign up for access to the world's latest research
Abstract
The challenges of searching the increasingly large collections of digital images which are appearing in many places mean that automated annotation of images is becoming an important task. We describe our participation in the ImageCLEF 2010 Visual Concept Detection and Annotation Task. Our approach used only the textual features (Flickr user tags and EXIF information) to perform the automatic annotation. Our approach was to explore the use of different techniques to improve the results of textual annotation. We identify the drawbacks of our approach and how these might be addressed and optimized in further work.
Related papers
2012
The ImageCLEF 2012 Scalable Image Annotation Using Gen- eral Web Data Task proposed a challenge, in which as training data instead of relying only on a set of manually annotated images, the ob- jective was to make use of automatically gathered Web data, with the aim of developing more scalable image annotation systems. To this end, the participants were provided with a new dataset, composed of 250,000 images for training, which included various visual feature types, and tex- tual features obtained from the websites in which the images appeared. Two subtasks were defined. The first subtask employed the same test set as the ImageCLEF 2012 Flickr Photo Annotation subtask, with the particularity that both the Flickr and Web training sets had to be used. The idea was to determine if the Web data could help to enhance the annotation performance in comparison to using only manually annotated data. The second subtask consisted in using only automatically gathered Web data to develop an imag...
2010
Abstract. In this paper we present three methods for image autoannotation used by the Wroclaw University of Technology group at ImageCLEF 2010 Photo Annotation track. All of our experiments focus on robustness of the global color and texture image features in connection with different similarity measures. To annotate training set we use two version of PATSI algorithm which searches for the most similar images and transferring annotations from them to the target image by applying transfer function.
This paper presents our submitted experiments in the Concept annotation and Concept Retrieval tasks using Flickr photos at ImageCLEF 2012. This edition we applied new strategies for both the textual and the visual subsystems included in our multimodal retrieval system. The visual subsystem has focus on extending the low-level features vector with concept features. These concept features have been calculated by means of a logistic regression model. The textual subsystem has focus on expanding the query information using external resources. Our best concept retrieval run, a multimodal one, is at the ninth position with a MnAP of 0.0295, being the second best group of the contest for the multimodal modality. This is also our best run in the global ordered list (where eleven textual runs are also better than it). We have adapted our multimodal retrieval process for the annotation task obtaining non-very good results for this first participation, with a MiAP of 0.1020.
International Journal of Engineering Sciences & Research Technology, 2014
The vast resource of pictures available on the web and the fact that many of them naturally co-occur with topically related documents and are captioned we focus on the task of automatically generating captions for images, here the model learns to create captions from a database of news articles, and the pictures embedded in them, and their captions, and consists of two stages. Content selection identifies what the image and accompanying article are about, whereas surface realization determines how to verbalize the chosen content. We approximate content selection with a probabilistic image annotation model that suggests keywords for an image. In the Proposed system extensive features are extracted from the database images and stored in the feature library. The extensive features set is comprised of shape features along with the color, texture and the contour let features, which are utilized in this work. When a query image is given, the features are extracted in the similar fashion. Subsequently, GA-based similarity measure is performed between the query image features and the database image features.
HAL (Le Centre pour la Communication Scientifique Directe), 2012
In this paper, we present the methods we have proposed and evaluated through the ImageCLEF 2012 Photo Annotation task. More precisely, we have proposed the Histogram of Textual Concepts (HTC) textual feature to capture the relatedness of semantic concepts. In contrast to term frequency-based text representations mostly used for visual concept detection and annotation, HTC relies on the semantic similarity between the user tags and a concept dictionary. Moreover, a Selective Weighted Late Fusion (SWLF) is introduced to combine multiple sources of information which by iteratively selecting and weighting the best features for each concept at hand to be classified. The results have shown that the combination of our HTC feature with visual features through SWLF can improve the performance significantly. Our best model, which is a late fusion of textual and visual features, achieved a MiAP (Mean interpolated Average Precision) of 43.67% and ranked first out of the 80 submitted runs.
Lecture Notes in Computer Science, 2007
This paper describes the general photographic retrieval and object annotation tasks of the ImageCLEF 2006 evaluation campaign. These tasks provide both the resources and the framework necessary to perform comparative laboratory-style evaluation of visual information systems for image retrieval and automatic image annotation. Both tasks offer something new for 2006 and attracted a large number of submissions: 12 groups participating in ImageCLEFphoto and 3 in the automatic annotation task. This paper summarises components used in the benchmark, including the collections, the search and annotation tasks, the submissions from participating groups, and results. The general photographic retrieval task, ImageCLEFphoto, used a new collection -the IAPR-TC12 Benchmark -of 20,000 colour photographs with semi-structured captions in English and German. This new collection replaces the St Andrews collection of historic photographs used for the previous three years. For ImageCLEFphoto groups submitted mainly text-only runs. However, 31% of runs involved some kind of visual retrieval technique, typically combined with text through the merging of image and text retrieval results. Bilingual text retrieval was performed using two target languages: English and German, with 59% of runs bilingual. Highest monolingual of English was shown to be 74% for Portuguese-English and 39% of German for English-German. Combined text and retrieval approaches were seen to give, on average, higher retrieval results (+54%) than using text (or image) retrieval alone. Similar to previous years, the use of relevance feedback (most commonly in the form of pseudo relevance feedback) to enable query expansion was seen to improve the text-based submissions by an average of 39%. Topics have been categorised and analysed with respect to various attributes including an estimation of their "visualness" and linguistic complexity.
The ImageCLEF 2013 Scalable Concept Image Annotation Subtask was the second edition of a challenge aimed at developing more scalable image annotation systems. Unlike traditional image annotation challenges, which rely on a set of manually annotated images as training data for each concept, the participants were only allowed to use automatically gathered web data instead. The main objective of the challenge was to focus not only on the image annotation algorithms developed by the participants, where given an input image and a set of concepts they were asked to decide which of them were present in the image and which ones were not, but also on the scalability of their systems, such that the concepts to detect were not exactly the same between the development and test sets. The participants were provided with web data consisting of 250,000 images, which included textual features obtained from the web pages on which the images appeared, as well as various visual features extracted from the images themselves. To evaluate the performance of the submitted systems a development set was provided containing 1,000 images that were manually annotated for 95 concepts and a test set containing 2,000 images that were annotated for 116 concepts. In total 13 teams participated, submitting a total of 58 runs, most of which significantly outperformed the baseline system for both the development and test sets, including for the test concepts not present in the development set and thus clearly demonstrating potential for scalability.
The UNED-UV group at the ImageCLEF2013 Campaign have participated in the Scalable Concept Image Annotation subtask. We present a multimedia IR-based system for the annotation task. In this collection, the images do not have any textual description associated, so we have downloaded and preprocessed the web pages which contain the images. Regarding the concepts, we expanded their textual description with additional information from external resources as Wikipedia or WordNet and we generate a KLD concept model using recovered textual information. The multimedia IR-based system uses a logistic relevance algorithm to get a model for each of the concepts to be trained using visual image features. Finally, the fusion subsystem merges textual and visual scores for a certain image to belong a concept, and decides the presence of the concept in the images.
2011
In this paper we present details on the joint submission of TU Berlin and Fraunhofer FIRST to the ImageCLEF 2011 Photo Annotation Task. We sought to experiment with extensions of Bag-of-Words (BoW) models at several levels and to apply several kernel-based learning methods recently developed in our group. For classifier training we used non-sparse multiple kernel learning (MKL) and an efficient multi-task learning (MTL) heuristic based on MKL over kernels from classifier outputs. For the multi-modal fusion we used a smoothing method on tag-based features inspired by Bag-of-Words soft mappings and Markov random walks. We submitted one multi-modal run extended by the user tags and four purely visual runs based on Bag-of-Words models. Our best visual result which used the MTL method was ranked first according to mean average precision (MAP) within the purely visual submissions. Our multi-modal submission achieved the first rank by MAP among the multi-modal submissions and the best MAP among all submissions. Submissions by other groups such as BPACAD, CAEN, UvA-ISIS, LIRIS were ranked closely.
Working Notes of CLEF, 2011
In this paper, we focus on one of the ImageCLEF tasks that LIRIS-Imagine research group participated: visual concept detection and annotation. For this task, we firstly propose two kinds of textual features to extract semantic meanings from text associated to images: one is based on semantic distance matrix between the text and a semantic dictionary, and the other one carries the valence and arousal meanings by making use of the Affective Norms for English Words (ANEW) dataset. Meanwhile, we investigate efficiency of different visual features including color, texture, shape, high level features, and we test four fusion methods to combine various features to improve the performance including min, max, mean and score. The results have shown that combination of our textural features and visual features can improve the performance significantly.

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
References (8)
- J. Min, J. Leveling, G.J.F.Jones: Document Expansion for Image Retrieval. RIAO conference, 2009
- MIRFLICKR Image Collection Website. http://press.liacs.nl/mirflickr/
- Tsikrika, T., Kludas, J.: Overview of the WikipediaMM Task at ImageCLEF 2009. In: Working Notes for the CLEF 2009 Workshop, Corfu, Greece (2009)
- Jiquan Ngiam, Hanlin Goh: I2R ImageCLEF Photo Annotation 2009 Working Notes. ImageCLEF task 2009
- Supheakmungkol SARIN, Wataru KAMEYAMA: Joint Equal Contribution of Global and Local Features for Image Annotation. ImageCLEF Photo Annotation task 2009 working note.
- J. Min, P. Wilkins, J. Leveling, and G.J.F.Jones: DCU at WikipediaMM 2009: Document expansion from wikipedia abstracts. In Working Notes for the CLEF 2009 Workshop, Corfu, Greece, 30 September to 2 October, 2009.
- A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence.
- String Metrics Introduction. Wikipedia