Abstract
This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
References (23)
- C. G. M. Snoek, M. Worring, Multimodal Video Indexing: A Review of the State-of-the-art, Intelligent Sensory Information Systems, Univ. of Amsterdam, ISIS technical report series, Vol. 2001-20, December 2001
- Y. Wang, Z. Liu, J. Huang, Multimedia Content Analysis Using Both Audio and Visual Clues, IEEE Signal Processing Magazine, Vol. 17, 12-36, 2000
- R. Brunelle, O. Mich, C. M. Modena, A Survey on the Automatic Indexing of Video Data, Journal of Visual Communication and Image Representation, Vol. 10, 78-112, 1999
- B. Furht, S. W. Smoliar, H. J. Zhang, Video and Image Processing in Multimedia Systems, Kluwer Academic Publishers, Norwell, USA, 2th edition, 1996
- A. A. Alatan, A. N. Akansu, W. Wolf, Multimodal Dialogue Scene Detection Using Hidden Markov Models for Content-Based Multimedia Indexing, Multimedia Tools and Applications, Vol. 14, 137-151, 2001
- N. Babaguchi, Y. Kawai, T. Kitahashi, Event Based Indexing of Broadcasted Sports Video by Intermodal Collaboration, IEEE Transactions on Multimedia, Vol. 4, 68-75, 2002
- M. R. Naphade and T. S. Huang, A Probabilistic Framework for Semantic Video Indexing, Filtering and Retrieval, IEEE Transactions on Multimedia, Vol. 3, 141-151, 2001
- See for instance RetrievalWare (www.convera.com);
- VideoLogger (www.virage.com);
- ImageMine (www.imageproducts.com) and Media Archive (www.tecmath.com)
- Introduction to MPEG-7: Multimedia Content Description Interface, Edited by B. S. Manjunath, P. Salembier, T. Sikora, John Wiley & Sons, 2002
- International Organization for Standard, MPEG-7 Projects and Demos, Draft document, March 2001 11. http://mpeg-industry.com
- L. Gagnon, S. Foucher, V. Gouaillier, ERIC7: An Experimental Tool for Content-Based Image Encoding and Retrieval under the MPEG-7 Standard, Proceeding of the Winter International Symposium on Information and Communication Technologies (WISICT2004), Cancun, Mexico, Jan. 2004 (to appear)
- G. Boulianne, J. Brousseau, P. Ouellet, P. Dumouchel, French Large Vocabulary Recognition with Cross-Word Phonology Transducers, Proceedings IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2000), Istanbul, Turkey, June 5-9, 2000
- J. Vermaak, P. Pérez, M. Gangnet, Rapid Summarization and Browsing of Video Sequences, BMVC'2000, 2000
- P. Viola, M. Jones, Robust real-time object detection, Tech. Report No. CRL2001/01, Cambridge Research Laboratory, 2001
- R. Lienhart, J. Maydt, An Extended Set of Haar-like Features for Rapid Object Detection, IEEE ICIP 2002, Vol. 1, pp. 900-903, Sep. 2002; see also MRL Technical Report Dec. 2002 (http://www.lienhart.de/MRL-TR-May02- revised-Dec02.pdf)
- V. Nefian, M. H. Hayes, Face Recognition Using an Embedded HMM, IEEE Conference on Audio and Video- based Biometric Person Authentication, pp. 19-24, March 1999
- S. Foucher, L. Gagnon, Semi-Automatic Actor Identification in Video Shots Using Dempster-Shafer Theory, ICASSP04 (submitted)
- G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, New Jersey, 1976
- X. Sun, A. Divakaran, B. S. Manjunath, A Motion Activity Descriptor and Its Extraction in the Compressed Domain, IEEE Pacific-Rim Conference on Multimedia (PCM), LNCS 2195, pp. 450-453, 2001
- E. Scheirer, M. Slaney, Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator, Proc. ICASSP 1997, pp. 1331-1334, 1997
- K. Ng, Subword-Based Approaches for Spoken Document Retrieval, Ph.D. Thesis, MIT, Department of Electrical Engineering and Computer Science, February 2000, 187 pages
claude chapdelaine