Academia.eduAcademia.edu

Outline

MPEG7 audio-visual indexing test-bed for video retrieval

2003

https://doi.org/10.1117/12.524495

Abstract

This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.

References (23)

  1. C. G. M. Snoek, M. Worring, Multimodal Video Indexing: A Review of the State-of-the-art, Intelligent Sensory Information Systems, Univ. of Amsterdam, ISIS technical report series, Vol. 2001-20, December 2001
  2. Y. Wang, Z. Liu, J. Huang, Multimedia Content Analysis Using Both Audio and Visual Clues, IEEE Signal Processing Magazine, Vol. 17, 12-36, 2000
  3. R. Brunelle, O. Mich, C. M. Modena, A Survey on the Automatic Indexing of Video Data, Journal of Visual Communication and Image Representation, Vol. 10, 78-112, 1999
  4. B. Furht, S. W. Smoliar, H. J. Zhang, Video and Image Processing in Multimedia Systems, Kluwer Academic Publishers, Norwell, USA, 2th edition, 1996
  5. A. A. Alatan, A. N. Akansu, W. Wolf, Multimodal Dialogue Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing, Multimedia Tools and Applications, Vol. 14, 137-151, 2001
  6. N. Babaguchi, Y. Kawai, T. Kitahashi, Event Based Indexing of Broadcasted Sports Video by Intermodal Collaboration, IEEE Transactions on Multimedia, Vol. 4, 68-75, 2002
  7. M. R. Naphade and T. S. Huang, A Probabilistic Framework for Semantic Video Indexing, Filtering and Retrieval, IEEE Transactions on Multimedia, Vol. 3, 141-151, 2001
  8. See for instance RetrievalWare (www.convera.com);
  9. VideoLogger (www.virage.com);
  10. ImageMine (www.imageproducts.com) and Media Archive (www.tecmath.com)
  11. Introduction to MPEG-7: Multimedia Content Description Interface, Edited by B. S. Manjunath, P. Salembier, T. Sikora, John Wiley & Sons, 2002
  12. International Organization for Standard, MPEG-7 Projects and Demos, Draft document, March 2001 11. http://mpeg-industry.com
  13. L. Gagnon, S. Foucher, V. Gouaillier, ERIC7: An Experimental Tool for Content-Based Image Encoding and Retrieval under the MPEG-7 Standard, Proceeding of the Winter International Symposium on Information and Communication Technologies (WISICT2004), Cancun, Mexico, Jan. 2004 (to appear)
  14. G. Boulianne, J. Brousseau, P. Ouellet, P. Dumouchel, French Large Vocabulary Recognition with Cross-Word Phonology Transducers, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), Istanbul, Turkey, June 5-9, 2000
  15. J. Vermaak, P. Pérez, M. Gangnet, Rapid Summarization and Browsing of Video Sequences, BMVC'2000, 2000
  16. P. Viola, M. Jones, Robust real-time object detection, Tech. Report No. CRL2001/01, Cambridge Research Laboratory, 2001
  17. R. Lienhart, J. Maydt, An Extended Set of Haar-like Features for Rapid Object Detection, IEEE ICIP 2002, Vol. 1, pp. 900-903, Sep. 2002; see also MRL Technical Report Dec. 2002 (http://www.lienhart.de/MRL-TR-May02- revised-Dec02.pdf)
  18. V. Nefian, M. H. Hayes, Face Recognition Using an Embedded HMM, IEEE Conference on Audio and Video- based Biometric Person Authentication, pp. 19-24, March 1999
  19. S. Foucher, L. Gagnon, Semi-Automatic Actor Identification in Video Shots Using Dempster-Shafer Theory, ICASSP04 (submitted)
  20. G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, New Jersey, 1976
  21. X. Sun, A. Divakaran, B. S. Manjunath, A Motion Activity Descriptor and Its Extraction in the Compressed Domain, IEEE Pacific-Rim Conference on Multimedia (PCM), LNCS 2195, pp. 450-453, 2001
  22. E. Scheirer, M. Slaney, Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator, Proc. ICASSP 1997, pp. 1331-1334, 1997
  23. K. Ng, Subword-Based Approaches for Spoken Document Retrieval, Ph.D. Thesis, MIT, Department of Electrical Engineering and Computer Science, February 2000, 187 pages