Academia.eduAcademia.edu

Outline

An audio-visual saliency model for movie summarization

2007, 2007 IEEE 9Th International Workshop on Multimedia Signal Processing, MMSP 2007 - Proceedings

https://doi.org/10.1109/MMSP.2007.4412882

Abstract

A saliency-based method for generating video summaries is presented, which exploits coupled audiovisual information from both media streams. Efficient and advanced speech and image processing algorithms to detect key frames that are acoustically and visually salient are used. Promising results are shown from experiments on a movie database.

References (21)

  1. K. Rapantzikos, Y. Avrithis, "An enhanced spatiotemporal visual attention model for sports video analysis", Proc. CBMI'05, Riga, Latvia, Jun 2005.
  2. Y.-F. Ma, X.-S. Hua, L. Lu, H.-J. Zhang, "A generic framework of user attention model and its application in video summarization", IEEE Trans. on Multimedia, vol. 7, pp. 907- 919, Oct 2005.
  3. Y. Li, S.-H. Lee, C.-H. Yeh, C.-C. Jay Kuo, "Techniques for movie content analysis and skimming", IEEE Signal Processing Magazine, pp. 79-89, Mar 2006.
  4. Y. Zhuang, Y. Rui, T.S. Huang, S. Mehrotra, "Adaptive Key Frame Extraction Using Unsupervised Clustering", Proc. ICIP'98, pp. 866-870,Oct 1998.
  5. S.X. Ju, M.J. Black, S. Minneman, D. Kimber, "Summarization of video-taped presentations: Automatic analysis of motion and gestures", IEEE Trans. Circuits Syst. Video Technology, vol. 8, pp. 686-696, Sep 1998.
  6. Y. Avrithis, A. Doulamis, N. Doulamis and S. Kollias, "A Stochastic Framework for Optimal Key Frame Extraction from MPEG Video Databases", Computer Vision and Image Understanding, vol. 75 (1/2), pp. 3-24, Jul 1999.
  7. S. Uchihashi, J. Foote, A. Girgensohn, J. Boreczky, "Video manga: Generating semantically meaningful video summaries", in Proc. ACM Multimedia'99, pp. 383-392, Oct 1999.
  8. K. Ratakonda, M.L. Sezan, R. Crinon, "Hierarchical video summarization", Proc. SPIE, vol. 3653, pp. 1531-1541, Dec 2000.
  9. M.A. Smith, T. Kanade, "Video skimming and characterization through the combination of image and language understanding techniques", Proc. CVPR'97, 1997.
  10. A.G. Hauptmann, "Lessons for the Future from a Decade of Informedia Video Analysis Research", Lecture Notes in Computer Science, Volume 3568, pp. 1-10, August 2005.
  11. A.G. Hauptmann, R. Yan, T.D. Ng, W. Lin, R. Jin, D. M., Christel, M. Chen, R. Baron, "Video Classification and Retrieval with the Informedia Digital Video Library System", Proc. TREC'02, Gaithersburg, MD, USA, November 2002.
  12. MUSCLE WP5 Movie Dialogue DataBase v1.1, Aristotle University of Thessaloniki, AIILab, 2007.
  13. C. Kayser, C. I. Petkov, M. Lippert and N. K. Logothetis, "Mechanisms for allocating auditory attention: an auditory saliency map", Current Biology, vol. 15, no. 21, pp. 1943-1947, 2005.
  14. N. Tsingos, E. Gallo and G. Drettakis, "Perceptual audio rendering of complex virtual environments", SIGGRAPH 2004.
  15. P. Maragos and J.F. Kaiser and T.F. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis", IEEE Trans. Signal Proc., vol. 41, no. 10, pp. 3024- 3051, 1993.
  16. A.C. Bovik and P. Maragos and T.F. Quatieri, "AM-FM Energy Detection and Separation in Noise Using Multiband Energy Operators", IEEE Trans. Signal Proc., vol. 41, no. 12, pp. 3245- 3265, 1993
  17. G. Evangelopoulos and P. Maragos, "Multiband Modulation Energy Tracking for Noisy Speech Detection", IEEE Trans. Audio, Speech and Language Proc., vol.14, no.6, pp. 2024-2038, 2006.
  18. K. Rapantzikos. M. Zervakis "Robust optical flow estimation in MPEG sequences", IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Mar 2005
  19. K. Rapantzikos, N. Tsapatsoulis, Y. Avrithis, S. Kollias, "Spatiotemporal saliency for video classification", IEEE Transactions on Multimedia, submitted.
  20. E.R. Kandel, J.H. Schwartz, T.M. Jessell, "Essentials of Neural Science and Behavior", Appleton & Lange, Stamford, Connecticut, 1995
  21. N. Otsu, "A threshold selection method from gray level histograms", IEEE Trans. Systems, Man and Cybernetics, vol. 9, pp. 62-66, 1979