Academia.eduAcademia.edu

Outline

Efficient Visual Content Retrieval and Mining in Videos

https://doi.org/10.1007/978-3-540-30542-2_58

Abstract

We describe an image representation for objects and scenes consisting of a configuration of viewpoint covariant regions and their descriptors. This representation enables recognition to proceed successfully despite changes in scale, viewpoint, illumination and partial occlusion. Vector quantization of these de-scriptors then enables efficient matching on the scale of an entire feature film. We show two applications. The first is to efficient object retrieval where the technology of text retrieval, such as inverted file systems, can be employed at run time to return all shots containing the object in a manner, and with a speed, similar to a Google search for text. The object is specified by a user outlining it in an image, and the object is then delineated in the retrieved shots. The second application is to data mining. We obtain the principal objects, characters and scenes in a video by measuring the reoccurrence of these spatial configurations of viewpoint covariant regions. The applications are illustrated on two full length feature films.

References (17)

  1. A. Aner and J. R. Kender. Video summaries through mosaic-based shot and scene clustering. In Proc. ECCV. Springer-Verlag, 2002.
  2. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. ACM Press, ISBN: 020139829, 1999.
  3. N. Boujemaa, J. Fauqueur, and V. Gouet. What's beyond query by example? In Trends and Advances in Content-Based Image and Video Retrieval, 2004.
  4. Y. Gong and X. Liu. Generating optimal video summaries. In IEEE Intl. Conf. on Multimedia and Expo (III), pages 1559-1562, 2000.
  5. D. Lowe. Object recognition from local scale-invariant features. In Proc. ICCV, pages 1150- 1157, Sep 1999.
  6. J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In Proc. BMVC., pages 384-393, 2002.
  7. K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points. In Proc. ICCV, 2001.
  8. K. Mikolajczyk and C. Schmid. An affine invariant interest point detector. In Proc. ECCV. Springer-Verlag, 2002.
  9. S. Obdrzalek and J. Matas. Object recognition using local affine frames on distinguished regions. In Proc. BMVC., pages 113-122, 2002.
  10. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce. 3d object modeling and recognition using affine-invariant patches and multi-view spatial constraints. In Proc. CVPR, 2003.
  11. F. Schaffalitzky and A. Zisserman. Multi-view matching for unordered image sets, or "How do I organize my holiday snaps?". In Proc. ECCV, volume 1, pages 414-431. Springer- Verlag, 2002.
  12. C. Schmid and R. Mohr. Local greyvalue invariants for image retrieval. IEEE PAMI, 19(5):530-534, May 1997.
  13. J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. ICCV, Oct 2003.
  14. J. Sivic and A. Zisserman. Video data mining using configurations of viewpoint invariant regions. In Proc. CVPR, 2004.
  15. D.M. Squire, W. Müller, H. Müller, and T. Pun. Content-based query of image databases: inspirations from text retrieval. Pattern Recognition Letters, 21:1193-1198, 2000.
  16. B. Tseng, C.-Y. Lin, and J. R. Smith. Video personalization and summarization system. In MMSP, 2002.
  17. T. Tuytelaars and L. Van Gool. Wide baseline stereo matching based on local, affinely in- variant regions. In Proc. BMVC., pages 412-425, 2000.