We describe in this paper the different approaches tested for the Photo Annotation task for CLEF ... more We describe in this paper the different approaches tested for the Photo Annotation task for CLEF 2011. We experimented state of the art techniques, by proposing late fusions of several classifiers trained on several features extracted from the images. The classifiers are SVMs and the late fusion is a simple addition of classification probabilities coming from the SVMs. The results obtained place our runs in the middle of the pack, with our best visual-based MAP at 0.337 We also integrated of Flickr human annotations, leading to a large increase of the MAP with a value of 0.377.
La recherche de vidéos peut être faite en ordonnant les échantillons en fonction de scores de pro... more La recherche de vidéos peut être faite en ordonnant les échantillons en fonction de scores de probabilité produits par des classifieurs. Il est souvent possible d'améliorer la performance des systèmes par un réordonnancement de ces échantillons. Dans cet article, nous proposons une telle méthode et nous proposons également la combinaison de cette méthode avec un apprentissage actif pour l'indexation de vidéos. Les résultats expérimentaux montrent que la méthode de réordonnancement proposée a été en mesure d'améliorer la performance du système avec une augmentation d'environ 16-22% du score en moyenne sur la tâche d'indexation sémantique en TRECVID 2010. En outre, elle a amélioré la performance du système d'indexation des vidéos par apprentissage actif, en considérant l'aire sous la courbe (AUC) comme mesure d'évaluation de la performance de l'apprentissage actif. Notre méthode de réordonnancement améliore la performance d'environ 20% en moyenne sur la collection TRECVID 2007. ABSTRACT. Video retrieval can be done by ranking the samples according to their probability scores that were produced by classifiers. It is often possible to improve the retrieval performance by re-ranking the samples. In this paper, we proposed such a method and we combined this method with active learning for video indexing. Experimental results showed that the proposed re-ranking method was able to improve the system performance with about 16-22% in average on TRECVID 2010 semantic indexing task. Furthermore, it improved significantly the performance of the video indexing system based active learning; by considering the Area Under Curve (AUC) as a metric measure for the performance of the active learning, our re-ranking method improved the performance with about 20% in average on TRECVID 2007.
LIG participated to the semantic indexing main task. LIG also participated to the organization of... more LIG participated to the semantic indexing main task. LIG also participated to the organization of this task. This paper describes these participations which are quite similar to our previous year's participations (within the Quaero consortium).
Incremental Multiple Classifier Active Learning for Concept Indexing in Images and Videos
Lecture Notes in Computer Science, 2011
... Page 9. 248 B. Safadi, Y. Tong, and G. Quénot Fig. 3. The Map results on the TRECVID 2008 tes... more ... Page 9. 248 B. Safadi, Y. Tong, and G. Quénot Fig. 3. The Map results on the TRECVID 2008 test collection evaluated on the four descriptors, each one of the plots shows the results using the Single-learner (in red), the Multi-learner (in green) and the Incremental (in blue) ...
We proposed a re-ranking method for improving the performance of semantic video indexing and retr... more We proposed a re-ranking method for improving the performance of semantic video indexing and retrieval. Experimental results show that the proposed re-ranking method is effective and it improves the system performance on average by about 16-22% on TRECVID 2010 semantic indexing task.
In this paper, we propose and evaluate a method for optimizing descriptors used for content-based... more In this paper, we propose and evaluate a method for optimizing descriptors used for content-based multimedia indexing and retrieval. A large variety of descriptors are commonly used for this purpose. However, the most efficient ones often have characteristics preventing them to be easily used in large scale systems. They may have very high dimensionality (up to tens of thousands dimensions) and/or be suited for a distance costly to compute (e.g. χ 2 ). The proposed method combines a PCA-based dimensionality reduction with pre-and post-PCA non-linear transformations. The resulting transformation is globally optimized. The produced descriptors have a much lower dimensionality while performing at least as well, and often significantly better, with the Euclidean distance than the original high dimensionality descriptors with their optimal distance. The method has been validated and evaluated for a variety of descriptors using TRECVid 2010 semantic indexing task data. It has then be applied at large scale for the TRECVid 2012 semantic indexing task on tens of descriptors of various types and with initial dimensionalities from 15 up to 32,768. The same transformation can be used also for multimedia retrieval in the context of query by example and/or relevance feedback.
We propose and evaluate in this paper a combination of Active Learning and Multiple Classifiers a... more We propose and evaluate in this paper a combination of Active Learning and Multiple Classifiers approaches for corpus annotation and concept indexing on highly imbalanced datasets. Experiments were conducted using TRECVID 2008 data and protocol with four different types of video shot descriptors, with two types of classifiers (Logistic Regression and Support Vector Machine with RBF kernel) and with two different active learning strategies (relevance and uncertainty sampling). Results show that the Multiple Classifiers approach significantly increases the effectiveness of the Active Learning. On the considered dataset, the best performance is achieved when 15 to 30% of the corpus is annotated for individual descriptors and when 10 to 15% of the corpus is annotated for their fusion.
The Quaero group is a consortium of French organization working on Multimedia Indexing and Retrie... more The Quaero group is a consortium of French organization working on Multimedia Indexing and Retrieval 1 . UJF-LIG and KIT participated to the semantic indexing task and UJF-LIG participated to the organization of this task. This paper describes these participations. For the semantic indexing task, a classical approach based on feature extraction, classification and hierarchical late fusion was used. Four runs were submitted corresponding to the use or not of genetic algorithmbased fusion and of two distinct fusion optimization methods. Both led to a small performance improvement and our best run has an infAP of 0.0485 (33/101).
LIG at TRECVID 2009: Hierarchical Fusion for High Level Feature Extraction
ABSTRACT We investigated in this work a hierarchical fusion strategy for fusing the outputs of hu... more ABSTRACT We investigated in this work a hierarchical fusion strategy for fusing the outputs of hundreds of descriptors × classifier combinations. Over one hundred descriptors gathered in the context of the IRIM consortium were used for HLF detection with up to four different classifiers. The produced classification scores are then fused in order to produce a unique classification score for each video shot and HLF. In order to cope with the redundancy of the information obtained from similar descriptors and from different classifiers using them, we propose a hierarchical fusion approach so that 1) each different source type gets an appropriate global weight, 2) all the descriptors × classifier combinations from similar source type are first combined in the optimal way before being merged at the next level. The best LIG run has a Mean Inferred Average Precision of 0.1276, which is significantly above TRECVID 2009 HLF detection task median performance. We found that fusion of the classification scores from different classifier types improves the performance and that even with a quite low individual performance, audio descriptors can help. 1
Proceedings of International Conference on Multimedia Retrieval - ICMR '14, 2014
Currently, popular search engines retrieve documents on the basis of text information. However, i... more Currently, popular search engines retrieve documents on the basis of text information. However, integrating the visual information with the text-based search for video and image retrieval is still a hot research topic. In this paper, we propose and evaluate a video search framework based on using visual information to enrich the classic text-based search for video retrieval. The framework extends conventional text-based search by fusing together text and visual scores, obtained from video subtitles (or automatic speech recognition) and visual concept detectors respectively. We attempt to overcome the so called problem of semantic gap by automatically mapping query text to semantic concepts. With the proposed framework, we endeavor to show experimentally, on a set of real world scenarios, that visual cues can effectively contribute to the quality improvement of video retrieval. Experimental results show that mapping text-based queries to visual concepts improves the performance of the search system. Moreover, when appropriately selecting the relevant visual concepts for a query, a very significant improvement of the system's performance is achieved.
Uploads
Papers by Bahjat Safadi