IRIM at TRECVID 2011: Semantic Indexing and Instance Search
2011
Abstract
The IRIM group is a consortium of French teams working on Multimedia Indexing and Retrieval. This paper describes its participation to the TRECVID 2011 semantic indexing and instance search tasks. For the semantic indexing task, our approach uses a six-stages processing pipelines for computing scores for the likelihood of a video shot to contain a target concept. These scores are then used for producing a ranked list of images or shots that are the most likely to contain the target concept. The pipeline is composed of the following steps: descriptor extraction, descriptor optimization, classification, fusion of descriptor variants, higher-level fusion, and re-ranking. We evaluated a number of different descriptors and tried different fusion strategies. The best IRIM run has a Mean Inferred Average Precision of 0.1387, which ranked us 5th out of 19 participants. For the instance search task, we we used both object based query and frame based query. We formulated the query in standard way as comparison of visual signatures either of object with parts of DB frames or as a comparison of visual signatures of query and DB frames. To produce visual signatures we also used two apporaches: the first one is the baseline Bag-Of-Visual-Words (BOVW) model based on SURF interest point descriptor; the second approach is a Bag-Of-Regions (BOR) model that extends the traditional notion of BOVW vocabulary not only to keypoint-based descriptors but to region based descriptors.
References (25)
- A. Smeaton, P. Over and W. Kraaij, Evaluation cam- paigns and TRECVid, In MIR'06: Proceedings of the 8th ACM International Workshop on Multimedia Infor- mation Retrieval, pp321-330, 2006.
- P. Over, G. Awad, J. , B. Antonishek, M.2Michel, A. Smeaton, W. Kraaij, and G. Quénot, TRECVID 2011 -An Overview of the Goals, Tasks, Data, Eval- uation Mechanisms, and Metrics In Proceedings of the TRECVID 2011 workshop, Gaithersburg, USA, 5-7 Dec. 2011.
- Y.-C. Cheng and S.-Y. Chen. Image classification using color, texture and regions. In Image and Vision Com- puting, 21:759-776, 2003.
- P.H. Gosselin, M. Cord, Sylvie Philipp-Foliguet. Com- bining visual dictionary, kernel-based similarity and learning strategy for image category retrieval. In Com- puter Vision and Image Understanding, Special Issue on Similarity Matching in Computer Vision and Multi- media. Volume 110, Issue 3, Pages 403-41, 2008.
- D. Gorisse, M. Cord, F. Precioso, SALSAS: Sub-linear active learning strategy with approximate k-NN search, Pattern Recognition, In Press, Corrected Proof, Avail- able online 21 December 2010.
- M. Redi and B. Merialdo, Saliency moments for image categorization, In ICMR 2011, 1st ACM International Conference on Multimedia Retrieval, April 17-20, 2011, Trento, Italy.
- A. Oliva and A. Torralba, Modeling the shape of the scene: A holistic representation of the spatial envelope, In International Journal of Computer Vision, vol 42, number 3, pages 145-175, 2001.
- K. E. A. van de Sande, T. Gevers, and C. G. M. Snoek. A comparison of color features for visual concept clas- sification. In ACM International Conference on Image and Video Retrieval, pages 141-150, 2008.
- Ivan Laptev, On space-time interest points, Int. J. Comput. Vision, 64:107-123, September 2005.
- A. Benoit, A. Caplier, B. Durette, and J. Herault, Us- ing human visual system modeling for bio-inspired low level image processing, In Computer Vision and Image Understanding, vol. 114, no. 7, pp. 758-773, 2010.
- S. Paris, H .Glotin, Pyramidal Multi-level Features for the Robot Vision@ICPR 2010 Challenge, In 20th Inter- national Conference on Pattern Recognition, pp.2949- 2952, 2010
- B. Safadi, G. Quénot. Evaluations of multi-learners approaches for concepts indexing in video documents. In RIAO, Paris, France, April 2010.
- Georges Quénot. KNNLSB: K Nearest Neighbors Linear Scan Baseline, 2008. Software available at http://mrim.imag.fr/georges.quenot/freesoft/ knnlsb/index.html.
- Safadi et al. Quaero at TRECVID 2011: Semantic In- dexing and Multimedia Event Detection, In Proceedings of the TRECVID 2011 workshop, Gaithersburg, USA, 5-7 Dec. 2011.
- Stéphane Ayache and Georges Quénot, Video Corpus Annotation using Active Learning, In 30th European Conference on Information Retrieval (ECIR'08), Glas- gow, Scotland, 30th March -3rd April, 2008.
- D. Gorisse et al., IRIM at TRECVID 2010: High Level Feature Extraction and Instance Search. In TREC Video Retrieval Evaluation workshop, Gaithers- burg, MD USA, November 2010.
- Alice Porebski, Color texture feature selection for im- age classification. Application to flaw identification on decorated glasses printing by a silk-screen process. Phd thesis, Universit Lille 1, Sciences et Technologies, Nov. 2009
- V. D. Blondel and J. Guillaume and R. Lambiotte and E. Lefebvre, Fast Unfolding of Community Hierarchies in Large Networks, In Computing Research Repository, abs/0803.0, 2008.
- B. Safadi, G. Qunot. Re-ranking by Local Re-scoring for Video Indexing and Retrieval, CIKM 2011: 20th ACM Conference on Information and Knowledge Man- agement, Glasgow, Scotland, oct 2011.
- J. Sivic and A. Zisserman. Video google: a text retrieval approach to object matching in videos. In ICCV'03, volume 2, pages 1470-1477, 2003.
- H. Bay, Herbert, T.Tuytelaars,and L. Van Gool. SURF: Speeded Up Robust Features, In ECCV 2006, pp 404-417, 2006.
- R. Vieux, J. Benois-Pineau, and J.-Ph. Domenger. Content based image retrieval using bag of region. In MMM 2012 -The 18th International Conference on Multimedia Modeling, 2012.
- P. F. Felzenszwalb and D. P. Huttenlocher. Efficient graph-based image segmentation. International Journal of Computer Vision, 59:167-181, 2004.
- Emilie Dumont and Bernard Merialdo. Rushes video summarization and evaluation. Multimedia Tools and Applications, Springer, Vol.48, N1, May 2010, 2010.
- Edwin Lughofer. Extensions of vector quantization for incremental clustering. Pattern Recognition, 41:995- 1011, 2008.