A Survey on the Automatic Indexing of Video Data
1999, Journal of Visual Communication and Image Representation
Abstract
Today a considerable amount of video data in multimedia databases requires sophisticated indices for its effective use. Manual indexing is the most effective method to do this, but it is also the slowest and the most expensive. Automated methods have then to be developed. This paper surveys several approaches and algorithms that have been recently proposed to automatically structure audio–visual data, both for annotation and access.
References (112)
- N. R. Adam and B. K. Bhargava, Digital Libraries, Springer-Verlag, New York/Berlin, 1995.
- F. Banfi, Image Databases: State of the Art and Future Directions, Internal Working Paper 96-10, University of Fribourg, Fribourg, Switzerland, September 1996.
- A. Brink, S. Marcus, and V. S. Subrahmanian, Heterogeneous multimedia reasoning, Computer, September 1995, 33-39.
- G. Davenport and M. Murtaugh, ConText: Toward the evolving documentary, in ACM Multimedia 95, San Francisco, CA, November 1995, pp. 5-8.
- E. A. Fox, Advances in interactive digital multimedia systems, Computer, October 1991, 9-21.
- D. J. Gemmell, H. M. Vin, D. D. Kandlur, P. V. Rangan, and L. A. Rowe, Multimedia storage servers: A tutorial, Computer, May 1995, 40-49.
- R. Jain, InfoScopes: Multimedia Information Systems, Technical Report VCL-95-107, Visual Computing Laboratory, University of California, San Diego, 1995.
- R. Jain, A. Pentland, and D. Petkovic, Workshop Report: NFS-ARPA workshop on visual information management systems, in Proceedings of the 5th ICCV, Cambridge, MA, 1995.
- S. Moni and R. L. Kashyap, A multiresolution representation scheme for multimedia databases, Multimedia Systems 3, 1995, 228-237.
- A. D. Narasimhalu, Multimedia databases, Multimedia Systems 4(5), 1996, 226-249.
- H. Nishiyama, S. Kin, T. Yokoyama, and Y. Matsushita, An image retrieval system considering subjective perception, in Proceedings of CHI '94, Boston, 1994, Vol. 4, pp. 30-36.
- A. A. Rodriguez and L. A. Rowe, Multimedia systems and applications, Computer, May 1995, 20-23.
- V. S. Subrahmanian and S. Jajodia, Multimedia Database Systems, Springer-Verlag, New York/Berlin, 1996.
- H. J. Zhang, S. W. Smoliar, J. H. Wu, C. Y. Low, and A. Kankanhalli, A video database system for digi- tal libraries, in Advances in Digital Libraries, Lecture Notes in Computer Science, Springer-Verlag, New York/Berlin, 1995.
- A. Hampapur, R. Jain, and T. Weymouth, Digital video indexing in multimedia systems, in Proceedings of the Workshop on Indexing and Reuse in Multimedia Systems, August 1994, AAAI Press, Menlo Park, CA, 1994.
- A. Hampapur, R. Jain, and T. Weymouth, Feature based digital video indexing, in Proceedings of IFIP 2.6 Third Working Conference on Visual Database Systems VDB.3, Lausanne, Switzerland, March 29-31, 1995.
- G. Ahanger, D. Benson, and T. D. C. Little, Video Query Formulation, Technical Report MCL 01-09-1995, Multimedia Communication Laboratory, Boston University, 1995.
- G. Ahanger and T. D. C. Little, A survey of technologies for parsing and indexing digital video, J. Visual Commun. Image Rep. 7(1), 1996, 28-43.
- C. Faloutsos and K. I. Lin, FastMap: A fast algorithm for indexing, data-mining and visualization of tradi- tional and multimedia datasets, in Proceedings of SIGMOD '95, 1995, pp. 163-174.
- A. S. Gordon and E. A. Domeshek, Conceptual indexing for video retrieval, in Proceedings IJCAI '95 Workshop on Intelligent Multimedia Information Retrieval, Montreal, August 1995 (M. Maybury, Ed.).
- T. Sikora, The MPEG-4 video standard and its potential for future multimedia applications, in Proc. IEEE ISCAS Conference, Hong Kong, June 1997.
- M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, Query by image and video content: The QBIC systems, Computer, September 1995, 23-32.
- P. Kelly and M. Cannon, CANDID: Comparison algorithm for navigating digital image databases, in Proceed- ings of the Seventh International Working Conference on Scientific and Statistical Database Management, Charlottesville, VA, September 1994, pp. 252-258.
- P. Kelly and M. Cannon, Experience with CANDID: Comparison algorithm for navigating digital image databases, in Proceedings SPIE of the 23rd AIPR Workshop On Image and Information Systems: Applications and Opportunities, Washington DC, October 12-14, 1994, pp. 64-75.
- P. Kelly and M. Cannon, Query by image example: The CANDID approach, in Proceedings of SPIE Storage and Retrieval for Image and Video Databases III, 1995, Vol. 2420, pp. 238-248.
- I. K. Sethi and N. Patel, A statistical approach to scene change detection, in Proceedings of SPIE Storage and Retrieval for Image and Video Databases III, San Josè, CA, 1995, Vol. 2420.
- H. J. Zhang, A. Kankanhalli, and S. W. Smoliar, Automatic partitioning of full-motion video, Multimedia Systems 1, 1993, 10-28.
- P. Aigrain and O. Joly, The automatic real-time analysis of film editing and transition effects and its appli- cations, Comput. & Graphics 18(1), 1994, 93-103.
- U. Gargi, S. Oswald, D. A. Kosiba, S. Devadiga, and R. Kasturi, Evaluation of video sequence indexing and hierarchical video indexing, in Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases III, 1995, Vol. 2420.
- U. Gargi, R. Kasturi, S. Strayer, and S. Antani, An Evaluation of Color Histogram Based Methods in Video Indexing. Technical Report CSE-96-053, Penn State University, Department of Computer Science and Engineering, 1996.
- J. M. Corridoni and A. Del Bimbo, Film semantic analysis, in Proceedings of the International Conference on Analysis of Image Patterns, Prague, Czech Republic, September 1995.
- W. Xiong, J. Chung-Mong, and R. H. Ma, Automatic video data structuring through shot partitioning and key-frame computing, Mach. Vision Appl. 10, 1997, 51-65.
- D. Le Gall, MPEG: A video compression standard for multimedia applications, Commun. ACM 34(4), 1991, 46-58.
- H. J. Zhang, C. Y. Low, and S. W. Smoliar, Video parsing and browsing using compressed data, Multimedia Tools Appl. 1, 1995, 91-113.
- International Telecommunication Union, ITU Place des Nations CH-1211, Geneva 2, H.263: Video coding for low bit rate communication, E 7244 edition, March 1996.
- J. Meng, Y. Juan, and S.-F. Chang, Scene change detection in a MPEG compressed video sequence, in Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Science & Technology, San Josè, CA, February 1995.
- H. C. Liu and G. L. Zick, Automatic determination of scene changes in MPEG compressed video, in Proc. of ISCAS-IEEE International Symposium on Circuits and Systems, 1995.
- F. Arman, A. Hsu, and M. Y. Chiu, Feature management for large video databases, in Proceedings of SPIE Storage and Retrieval for Image and Video Databases, 1993, pp. 2-12.
- E. Deardorff, T. D. C. Little, J. D. Marshall, D. Venkatesh, and R. Walzer, Video scene decomposition with the motion picture parser, in Proceedings of the IS&T/SPIE Symposium on Electronic Imaging Sci- ence and Technology (Digital Video Compression and Processing on Personal Computers: Algorithms and Technologies), San Josè, CA, February 1994, Vol. 2187, pp. 44-55.
- N. V. Patel and I. K. Sethi, Video shot detection and characterization for video databases, Pattern Recognition-Special Issue on Multimedia 30(4), 1997, 583-592.
- E. Ardizzone, G. A. M. Gioiello, M. La Cascia, and D. Molinelli, A real-time neural approach to scene cut detection, in IS&T/SPIE-Storage & Retrieval for Image and Video Databases IV, San Josè, CA, January 28-February 2, 1996.
- B.-L. Yeo, Efficient Processing of Compressed Images and Video, Ph.D. thesis, Dept. of Electrical Engineer- ing, Princeton University, January 1996.
- K. Shen and E. J. Delp, A fast algorithm for video parsing using MPEG compressed sequences, in Proc. of IEEE International Conference on Image Processing, October 1995, pp. 252-255.
- A. Hampapur, R. Jain, and T. Weymouth, Production model based digital video segmentation, J. Multimedia Tools Appl. 1(1), 1995, 9-46.
- A. Hampapur, R. Jain, and T. Weymouth, Digital video segmentation, in Proceedings Second Annual ACM MultiMedia Conference and Exposition, October 1994.
- R. Zabih, J. Miller, and K. Mai, A feature-based algorithm for detecting and classifying scene breaks, in Proceedings of the Fourth ACM Conference on Multimedia, San Francisco, CA, November 1995.
- W. Xiong and J. C.-M. Lee, Efficient scene change detection and camera motion annotation for video classification, Comput. Vision Image Understanding 71(2), 1998, 166-181.
- A. Dailianas, R. B. Allen, and P. England, Comparisons of automatic video segmentation algorithms, in Proceedings SPIE Photonics East'95: Integration Issues in Large Commercial Media Delivery System, Philadelphia, October 1995.
- J. S. Boreczky and L. A. Rowe, A comparison of video shot boundary detection techniques, in Storage & Retrieval for Image and Video Databases IV (I. K. Sethi and R. C. Jain, Eds.), SPIE Proceedings Series, Vol. 2670, pp. 170-179.
- R. Brunelli and T. Poggio, Template matching: Matched spatial filters and beyond, Pattern Recognition 30(5), 1997, 751-768.
- K. K. Sung and T. Poggio, Example-based learning for view-based human face detection, in Proceedings Image Understanding Workshop, 1994.
- M. S. Lew and N. Huijsmans, Information theory and face detection, in Proceedings of ICPR, 1996, pp. 601-605.
- B. Moghaddam and A. Pentland, Probabilistic visual learning for object representation, IEEE Trans. Pattern Anal. Mach. Intell. 19(7), 1997, 696-710.
- E. Osuna, R. Freund, and F. Girosi, Training support vector machines: An application to face detection, in Proc. of CVPR'97, Puerto Rico, June 1997.
- H. Wang and S. Fu Chang, A highly efficient system for automatic face region detection in MPEG video, IEEE Trans. Circuits Systems Video Technol. 7(4), 1997, 615-628.
- G. Burel and D. Carel, Detection and localization of faces on digital images, Pattern Recognit. Lett. 15, 1994, 963-967.
- H. A. Rowley, S. Baluja, and T. Kanade, Neural network-based face detection, in Image Understanding Workshop, February 1996, pp. 725-735.
- T. Sato, T. Kanade, E. K. Hughes, M. A. Smith, and S. Satoh, Video OCR: Indexing digital news libraries by recognition of superimposed caption, ACM Multimedia Systems, Special Issue on Video Libraries, submitted.
- Y. Ariki and T. Teranishi, Indexing and classification of TV-news articles based on telop recognition, in Fourth Int. Conf. on Document Analysis and Recognition, Ulm, Germany, August 1997, pp. 422-427.
- A. K. Jain and B. Yu, Automatic text location in images and video frames, in 14th Int. Conf. on Pattern Recognition, Brisbane, Australia, August 16-20, 1998.
- H.-K. Kim, Efficient automatic text location method and content-based indexing and structuring of video database, J. Visual Commun. Image Representation 7(4), 1996, 336-344.
- R. Lienhart and W. Effelsberg, Automatic Text Segmentation and Text Recognition for Video Indexing, Technical Report TR-98-009, Universität Mannheim, Praktische Informatik IV, Mannheim, Germany, 1998.
- A. Takeshita, T. Inoue, and K. Tanaka, Extracting text skim structures for multimedia browsing, in Pro- ceedings IJCAI '95 Workshop on Intelligent Multimedia Information Retrieval, Montreal, August 1995 (M. Maybury, Ed.).
- V. Wu, R. Manmatha, and E. M. Riseman, Finding text in images, in 2nd ACM Int. Conf. on Digital Libraries, Philadelphia, PA, July 1997, pp. 23-26.
- S. Messelodi and C. M. Modena, Automatic identification and skew estimation of text lines in real scene images, Pattern Recognit. 32(5), 1999, 789-808.
- U. Gargi, R. Kasturi, and S. Antani, Performance characterization and comparison of video indexing algo- rithms, in Proc. of Computer Society Conference on Computer Vision and Pattern Recognition, 1998.
- S. Sato, Y. Nakamura, and T. Kanade, Name-it: Naming and detecting faces in video by the integration of image and natural language processing, in Proc. of IJCAI-97, 1997.
- C. Cedras and M. Shah, Motion-based recognition: A survey, Image Vision Comput. 13(2), 1995, 129-155.
- J. Y. A. Wang and E. H. Adelson, Spatio-temporal segmentation of video data, in Image and Video Process- ing II, Proceedings of the SPIE, San Josè, CA, February 1994, Vol. 2182.
- J. Y. A. Wang, E. H. Adelson, and U. Desai, Applying mid-level vision techniques for video data compression and manipulation, in Digital Video Compression on Personal Computers: Algorithms and Technologies, Proceedings of the SPIE, San Josè, CA, February 1994, Vol. 2187.
- S. Ayer and H. S. Sawhney, Layered Representation of Motion Video using Robust Maximum-Likelihood Estimation of Mixture Models and MDL Encoding, Technical report, IBM Almaden Research Center, 650 Harry Road, San Josè, CA 95120, December 1994.
- S.-F. Chang, W. Chen, H. J. Meng, H. Sundaram, and D. Zhong, VideoQ: An automated content based video search system using visual cues, in Proceedings of the Fifth ACM Multimedia Conference, Seattle, November 1997.
- H. S. Sawhney, S. Ayer, and M. Gorkani, Model-based 2D & 3D dominant motion estimation for mosaicking and video representation, in Proceedings of ICCV 95, Cambridge, MA, June 1995.
- M. Bierling, Displacement estimation by hierarchical blockmatching, in Proc. SPIE, Visual Communications and Image Processing '88, January 1988. Vol. 1001, pp. 942-951.
- E. Sahouria, Video indexing based on object motion, Master's thesis, Dept. of Electrical Engineering and Computer Science, University of California at Berkeley, 1998.
- N. Hirzalla and A. Karmouch, Detecting cuts by understanding camera operation for video indexing, J. Visual Lang. Comput. 6, 1995, 385-404.
- M. A. Smith and T. Kanade, Video Skimming and Characterization through the Combination of Image and Language Understanding Techniques, Technical Report CMU-CS-97-111, School of Computer Science, Carnegie Mellon University, Pittsburg, PA 15213, February 1997.
- P. Joly and H.-K. Kim, Efficient automatic analysis of camera work and microsegmentation of video using spatiotemporal images, Signal Process. Image Commun. 8, 1996, 295-307.
- J. Park and C. W. Lee, Robust estimation of camera parameters from image sequence for video composition, Signal Process. Image Commun. 9, 1996, 43-53.
- N. Dimitrova and F. Golshani, Motion recovery for video content classification, ACM Trans. Inform. Systems 13(4), 1995, 408-439.
- M. Ioka and M. Kurokawa, Estimation of motion vectors and their application to scene retrieval, Mach. Vision Appl. 7, 1994, 199-208.
- S.-Y. Lee and H.-M. Kao, Video indexing-An approach based on moving object and track, in Proceedings SPIE Storage and Retrieval for Image and Video Databases, 1993, pp. 25-36.
- D. Zhong and S.-F. Chang, Video object model and segmentation for content-based video indexing, in IEEE Intern. Conf. on Circuits and Systems, Hong Kong, June 1997. [Special session on networked multimedia technology & application]
- E. Ardizzone and M. La Cascia, Video indexing using optical flow field, in Int. Conf. on Image Processing, ICIP-96, Lausanne, Switzerland, September 16-19, 1996.
- Y. Tonomura, A. Akutsu, K. Otsuji, and T. Sadakata, VideoMAP and video spaceIcon: Tools for anatomizing video content, in Proceedings of INTERCHI '93, ACM, April 24-29, 1993, pp. 131-136.
- L. Teodosio and W. Bender, Salient stills from video, in Proceedings of the ACM Multimedia '93 Conference, Anheim, CA, August 1993.
- H. S. Sawhney and S. Ayer, Compact representations of video through dominant and multiple motion estimation, IEEE Trans. Pattern Anal. Mach. Intell. 18(8), 1996.
- D. Yow, B.-L. Yeo, M. Yeung, and B. Liu, Analysis and presentation of soccer highlights from digital video, in Proceedings of the Second Asian Conference on Computer Vision, December 1995.
- J. L. Barron, D. J. Fleet, and S. S. Beauchemin, Performance of optical flow techniques, Int. J. Comput. Vision 12(1), 1994, 43-77.
- M. M. Yeung and B. Liu, Efficient Matching and Clustering of Video Shots, technical report, Princeton University, 1995.
- H. J. Zhang, S. W. Smoliar, and J. H. Wu, Content-based video browsing tools, in Proceedings SPIE Conference on Multimedia Computing and Networking, San Josè, CA, February 1995.
- W. Wolf, Key frame selection by motion analysis, in Proceedings of ICASSP '96.
- X. Liu, C. B. Owen, and F. Makedon, Automatic Video Pause Detection Filter, Technical Report PCS-TR97- 307, Dartmouth College, Computer Science, Hanover, NH, February 1997.
- B. Yeo and B. Liu, Rapid scene analysis on compressed videos, IEEE Trans. Circuit Systems Video Technol. 5(6), 1995, 533-544.
- B.-L. Yeo and B. Liu, A unified approach to temporal segmentation of motion JPEG and MPEG compressed video, in Proceedings of the Second Internation Conference on Multimedia Computing and Systems, May 1995.
- M. Yeung, B.-L. Yeo, and B. Liu, Extracting story units from long programs, in Proceedings of the Interna- tional Conference on Multimedia Computing and Systems, June 1996.
- M. M. Yeung, B.-L. Yeo, W. Wolf, and B. Liu, Video browsing using clustering and scene transitions on compressed sequences, in Proceedings of the IS&T/SPIE Multimedia Computing and Networking, San Josè, CA, February 1995.
- H.-H. Yu and W. Wolf, Scenic classification methods for image and video databases, in Proceedings of the ACM Multimedia '93 Conference (C.-C. J. Kuo, Ed.), Vol. 2606, 1995, pp. 363-371.
- S. Pfeiffer, R. Lienhart, S. Fisher, and W. Effelsberg, Abstracting digital movies automatically, J. Visual Commun. Image Representation 7(4), 1996, 345-353.
- S. Devadiga, D. A. Kosiba, U. Gargi, S. Oswald, and R. Kasturi, A semiautomatic video database system, in Proceedings of SPIE Conference on Storage and Retrieval for Image and Video Databases III, Vol. 2420, 1995.
- S. Fischer, R. Lienhart, and W. Effelsberg, Automatic recognition of film genres, in Proc. ACM Multimedia 95, San Francisco, CA, November 1995, pp. 295-304.
- M. K. Hawley, Structure out of Sound, Ph.D. thesis, School of Architecture and Planning, Massachusetts Institute of Technology, September 1993.
- P. Aigrain, P. Joly, and V. Longueville, Medium knowledge-based macrosegmentation of video into se- quences, in Proceedings IJCAI '95 Workshop on Intelligent Multimedia Information Retrieval, Montreal, Canada, August 1995 (M. Maybury, Ed.).
- H. J. Zhang, S. Y. Tan, S. W. Smoliar, and G. Yihong, Automatic parsing and indexing of news video, Multimedia Systems 2, 1995, 256-266.
- M. G. Christel, M. A. Smith, C. R. Taylor, and D. B. Winkler, Evolving video skims into useful multimedia abstractions, in Proc. of ACM CHI'98, Conference on Human Factors in Computing Systems, Los Angeles, CA, April 1998.
- Y. Nakamura and T. Kanade, Semantic analysis for video contents extraction-Spotting by association in news video, in Proc. of the Fifth ACM International Multimedia Conference, November 1997.
- R. Lienhart, W. Effelsberg, and R. Jain, VisualGREP: A systematic method to compare and retrieve video sequences, in SPIE, Storage and Retrieval for Image and Video Databases VI, January 1998, Vol. 3312.
- M. Caliani, C. Colombo, A. Del Bimbo, and P. Pala, Commercial video retrieval by induced semantics, in Proc. of IEEE Internation Workshop on Content-Based Access of Image and Video Databases (CAIVD8), Bombay, India, 1998.
- M. Christel and D. Martin, Information visualization within a digital video library, J. Intell. Inform. Systems 11 (3), 1998. [Special issue on Information Visualization]
- D. DeMenthon, V. Kobla, and D. Doermann, Video summarization by curve simplification, in Proceedings of the Sixth ACM Multimedia Conference, Bristol, UK, September 1998.
- K. Manske, M. Muhlhauser, S. Vogl, and M. Goldberg, OBVI: Hierarchical 3D video-browsing, in Proceed- ings of the Sixth ACM Multimedia Conference, Bristol, UK, September 1998.
- ROBERTO BRUNELLI was born in Trento, Italy in 1961. He received his degree (summa cum laude) in physics from the University of Trento in 1986. He joined ITC-irst in 1987, where he works in the Computer Vision Group of the Interactive Sensory Systems Division. In the past he was involved in research on computer vision tools, analysis of aerial images, development of algorithms working on compressed description of binary images, optimization, neural networks, and face analysis. His current major involvement is in OPAL (ESPRIT Project 25389), on-line programmes, digital archives, and distributed editorial collaboration. His current professional interests include optimization, robust statistics, object recognition, and multimedia databases. Personal interests include comics, inline-skating, science-fiction, motorbiking, and photography. ORNELLA MICH was born in Tesero, Trento, Italy in 1959. She received her degree in electronic engineering from the University of Padova, Italy in 1987. Then she worked on radar electronics for jet-fighters. After teaching electrotecnics in a high school, she joined IRST in 1989, collaborating on VLSI research. From 1990 she is working in the field of computer vision in the Interactive Sensory Systems Division. CARLA MARIA MODENA was born in Rovereto, Trento, Italy in 1961. She received her degree in mathematics from the University of Trento in 1985. In 1986 she worked at the Institut fuer Volkswirtschaftslehre, Universitaet Regensburg, Germany. She joined ITC-IRST in 1987, where she works in the Interactive Sensory Systems Divison. Her current interests in computer vision include text localization and recognition and document archives indexing.