Movie summarization based on audiovisual saliency detection
2008, Proceedings - International Conference on Image Processing, ICIP
https://doi.org/10.1109/ICIP.2008.4712308Abstract
Based on perceptual and computational attention modeling studies, we formulate measures of saliency for an audiovisual stream. Audio saliency is captured by signal modulations and related multifrequency band features, extracted through nonlinear operators and energy tracking. Visual saliency is measured by means of a spatiotemporal attention model driven by various feature cues (intensity, color, motion). Audio and video curves are integrated in a single attention curve, where events may be enhanced, suppressed or vanished. The presence of salient events is signified on this audiovisual curve by geometrical features such as local extrema, sharp transition points and level sets. An audiovisual saliency-based movie summarization algorithm is proposed and evaluated. The algorithm is shown to perform very well in terms of summary informativeness and enjoyability for movie clips of various genres.
References (23)
- REFERENCES
- L. Ying, S.-H. Lee, C.-H. Yeh, and C.-C.J. Kuo, "Techniques for movie content analysis and skimming," in IEEE Signal Processing Magazine, Mar 2006, vol. 23, pp. 79-89.
- Y. Zhuang, Y. Rui, T.S. Huang, and S. Mehrotra, "Adaptive key frame extraction using unsupervised clustering," in Proc. IEEE Int'l Conf. Image Processing (ICIP), 1998, pp. 866-870.
- S. X. Ju, M. J. Black, S. Minneman, and D. Kimber, "Summarization of videotaped presentations: automatic analysis of motion and gesture," IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 5, pp. 686-696, 1998.
- Avrithis Y., Doulamis A., Doulamis N., and Kollias S., "A stochas- tic framework for optimal key frame extraction from mpeg video databases," Comp. Vision and Image Understasting, vol. 75, no. 12, pp. 3-24, 1998.
- K. Ratakonda, M.I. Sezan, and R.J. Crinon, "Hierarchical video sum- marization," in Proc. SPIE, Visual Comm. and Image Proc. '99, Dec 1998, vol. 3653, pp. 1531-1541.
- S. Uchihashi, J. Foote, A. Girgensohn, and J. Boreczky, "Video Manga: generating semantically meaningful video summaries," in Proc. 7th ACM MULTIMEDIA, 1999, pp. 383-392.
- X.D. Sun and M.S. Kankanhalli, "Video summarization using r- sequences," Real-time imaging, vol. 6, no. 6, pp. 449-459, Dec 2000.
- A. Girgensohn, J. Boreczky, and L. Wilcox, "Keyframe-based user interfaces for digital video," IEEE Computer Magazine, vol. 34, no. 9, pp. 61-67, Sep 2001.
- A. Doulamis, N. Doulamis, Y. Avrithis, and S. Kollias, "A fuzzy video content representation for video summarization and content-based re- trieval," Signal Processing, vol. 80, no. 6, pp. 1049-1067, Jun 2000.
- A.G. Hauptmann, "Lessons for the future from a decade of Informe- dia video analysis research," in Proc. Intl. Conf. on Image and Video Retrieval (CIVR), LNCS, 2005, vol. 3568, pp. 1-10.
- M. Rautiainen et al., "TREC 2002 video track experiments at Medi- aTeam Oulu and VTT," in Proc. Text Retrieval Conf. (TREC), 2002.
- S. Raaijmakers, J. Den Hartog, and J. Baan, "Multimodal topic seg- mentation and classification of news video," in Proc. Text Retrieval Conf. (TREC), 2002, vol. 2, pp. 33-36.
- B. Adams et al., "IBM research TREC-2002 video retrieval system," in Proc. Text Retrieval Conf. (TREC), 2002.
- Y. Ma, X.S. Hua, L. Lu, and H. Zhang, "A generic framework of user at- tention model and its application in video summarization," IEEE Trans. Multimedia, vol. 7, no. 5, pp. 907-919, Oct 2005.
- P. Over, A. F. Smeaton, and P. Kelly, "The TRECVID 2007 BBC rushes summarization evaluation pilot," in TVS '07, 2007, pp. 1-15.
- "MUSCLE Movie DataBase v3.0," 2007, http://poseidon. csd.auth.gr/EN/MUSCLE_moviedb.
- G. Evangelopoulos, K. Rapantzikos, P. Maragos, Y. Avrithis, and A. Potamianos, "Audiovisual attention modeling and salient event de- tection," in Multimodal Processing and Interaction: Audio, Video, Text, P. Maragos, A. Potamianos, and P. Gross, Eds. Springer, 2008.
- P. Maragos, J.F. Kaiser, and T.F. Quatieri, "Energy separation in signal modulations with application to speech analysis," IEEE Trans. Signal Processing, vol. 41, no. 10, pp. 3024-3051, Oct 1993.
- G. Evangelopoulos and P. Maragos, "Multiband modulation energy tracking for noisy speech detection," IEEE Trans. Audio Speech Lan- guage Processing, vol. 14, no. 6, pp. 2024-2038, Nov 2006.
- C. Koch and S. Ullman, "Shifts in selective visual attention: towards the underlying neural circuitry," Human Neurobiology, vol. 4, no. 4, pp. 219-227, Jun 1985.
- W. T. Freeman and E.H. Adelson, "The design and use of steerable filters," IEEE Trans. PAMI, , no. 9, pp. 891-906, 1991.
- K. Rapantzikos, N. Tsapatsoulis, Y. Avrithis, and S. Kollias, ," Signal processing: Image Communication, submitted.