Academia.eduAcademia.edu

Outline

Video Segmentation via Temporal Pattern Classification

2000, IEEE Transactions on Multimedia

https://doi.org/10.1109/TMM.2006.888015

Abstract

We present a general approach to temporal media segmentation using supervised classification. Given standard low-level features representing each time sample, we build intermediate features via pairwise similarity. The intermediate features comprehensively characterize local temporal structure, and are input to an efficient supervised classifier to identify shot boundaries. We integrate discriminative feature selection based on mutual information to enhance performance and reduce processing requirements. Experimental results using large-scale test sets provided by the TRECVID evaluations for abrupt and gradual shot boundary detection are presented, demonstrating excellent performance.

References (34)

  1. A. Smeaton, W. Kraaij, and P. Over, "The trec 2003 video track report," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2003.
  2. W. Kraaij, A. Smeaton, P. Over, and J. Arlandis, "Trecvid 2004 -an introduction," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 1-13.
  3. J. M. Martinez, R. Koenen, and F. Pereira, "Mpeg-7: The generic multimedia content description standard," IEEE Multimedia, vol. 9, pp. 78-87, 2002.
  4. M. Cooper and J. Foote, "Scene boundary detection via video self-similarity analysis." in IEEE Intl. Conf. on Image Processing (3), 2001, pp. 378-381.
  5. J. Foote, "Visualizing music and audio using self-similarity." in ACM Multimedia (1), 1999, pp. 77-80.
  6. J. Adcock, A. Girgensohn, M. Cooper, T. Liu, L. Wilcox, and E. Rieffel, "Fxpal experiments for trecvid 2004," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 70-81.
  7. N. Vasconcelos, "Feature selection by maximum marginal diversity: optimality and implications for visual recognition." in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (1), 2003, pp. 762-772.
  8. N. Vasconcelos and M. Vasconcelos, "Scalable discriminant feature selection for image retrieval and recognition." in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (2), 2004, pp. 770-775.
  9. J. S. Boreczky and L. A. Rowe, "Comparison of video shot boundary detection techniques." in Storage and Retrieval for Image and Video Databases (SPIE), 1996, pp. 170-179.
  10. U. Gargi, R. Kasturi, and S. H. Strayer, "Performance characterization of video-shot-change detection methods." IEEE Trans. Circuits Syst. Video Techn., vol. 10, no. 1, pp. 1-13, 2000.
  11. F. Arman, A. Hsu, and M.-Y. Chiu, "Image processing on compressed data for large video databases," in MULTIMEDIA '93: Proceedings of the first ACM international conference on Multimedia. New York, NY, USA: ACM Press, 1993, pp. 267-272.
  12. B. Yeo and B. Liu, "A unified approach to temporal segmentation of motion jpeg and mpeg compressed video," in Proc. International Conference on Multimedia Computing and Systems, 1995, pp. 81-89.
  13. J. Bescos, "Real-time shot change detection over online mpeg-2 video," IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 4, pp. 475-484, 2004.
  14. K. Hoashi, M. Sugano, M. Naito, K. Matsumoto, F. Sugaya, and N. Y, "Shot boundary determination on mpeg compressed domain and story segmentation experiments for trecvid 2004," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 109-120.
  15. J. Puzicha, T. Hofmann, and J. M. Buhmann, "Non-parametric similarity measures for unsupervised texture segmentation and image retrieval," in IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR'97), 1997, pp. 267-272.
  16. N. Vasconcelos and A. Lippman, "Statistical models of video structure for content analysis and characterization," IEEE Trans. on Image Processing, vol. 9, no. 1, pp. 3-19, 2000.
  17. Y. Qi, A. Hauptmann, and T. Liu, "Supervised classification for video shot segmentation," in Proc. IEEE Intl. Conf. on Multimedia & Expo (II), 2003, pp. 689-692.
  18. A. Hanjalic, "Shot-boundary detection: Unraveled and resolved?" IEEE Trans. on Circuits and Systems for Video Technology, vol. 12, no. 2, pp. 90-105, 2002.
  19. H. Zhang, A. Kankanhalli, and S. W. Smoliar, "Automatic partitioning of full-motion video," Multimedia Syst., vol. 1, no. 1, pp. 10-28, 1993.
  20. D. Heesch, P. Howarth, J. Magalhães, A. May, M. Pickering, A. Yavlinsky, and S. Rüger, "Video retrieval using search and browsing," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 92-102.
  21. A. Witkin, "Scale-space filtering: A new approach to multi-scale description," in Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, Mar. 1981, pp. 39A.1.1-39A.1.4.
  22. M. Slaney, D. Ponceleon, and J. Kaufman, "Multimedia edges: finding hierarchy in all dimensions," in MULTIMEDIA '01: Proceedings of the ninth ACM international conference on Multimedia. ACM Press, 2001, pp. 29-40.
  23. D. Pye, N. Hollinghurst, T. Mills, and K. Wood, "Audio-visual segmentation for content-based retrieval," in Proc. Intl. Conf on Spoken Language Processing, 1998.
  24. M. Cooper, "Video segmentation combining similarity analysis and classification," in MULTIMEDIA '04: Proceedings of the 12th annual ACM international conference on Multimedia. ACM Press, 2004, pp. 252-255.
  25. T. Volkmer, S. Tahaghoghi, and H. Williams, "Rmit university at trecvid 2004," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 171-178.
  26. J. Yuan, W. Zheng, Z. Tong, L. Chen, D. Wang, D. Ding, J. Wu, J. Li, F. Lin, and B. Zhang, "Tsinghua university at trecvid 2004: Shot boundary detection and high-level feature extraction," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 184-196.
  27. C. Petersohn, "Fraunhofer hhi at trecvid 2004: Shot boundary detection system," in Proceedings of the TREC Video Retrieval Evaluation (TRECVID). Washington D.C.: NIST, 2004, pp. 64-69.
  28. B. Gunsel, M. Ferman, and A. M. Tekalp, "Temporal video segmentation using unsupervised clustering and semantic object tracking," Journal of Electronic Imaging, vol. 7, pp. 592-604, July 1998.
  29. T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer, 2001.
  30. T. Liu, A. Moore, and A. Gray, "Efficient exact k-nn and nonparametric classification in high dimensions," in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Schölkopf, Eds. Cambridge, MA: MIT Press, 2004.
  31. D. Fradkin and D. Madigan, "Experiments with random projections for machine learning," in KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, 2003, pp. 517-522.
  32. Matthew Cooper received the B.S., M.S., and D.Sc. degrees in electrical engineering from Washington University in St. Louis in 1993, 1994, and 1999 respectively. He joined FX Palo Alto Laboratory in 2000 where he is presently a senior research scientist. His research interests are in multimedia analysis and retrieval, statistical inference, information theory, and computer vision. He is a member of the IEEE.
  33. Ting Liu received the B.E. degree in computer science from Tsinghua University in 2001. She received the Ph.D. degree in computer science from Carnegie Melon University in 2006. She is presently a software engineer at Google, Inc. in Mountain View, CA.
  34. Eleanor Rieffel is a Senior Research Scientist at FX Palo Alto Laboratory, where she has been working since 1996. A mathematician by training, she has performed research in fields as diverse as geometric group theory, hypertext, bioinformatics, video analysis, evolutionary computation, modular robot control, and quantum computation.