A Survey on Visual Content-Based Video Indexing and Retrieval

Stephen Maybank

doi:10.1109/TSMCC.2011.2109710

Outline

A Survey on Visual Content-Based Video Indexing and Retrieval

Stephen Maybank

2000, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)

https://doi.org/10.1109/TSMCC.2011.2109710

visibility

…

description

23 pages

link

1 file

Abstract

Video indexing and retrieval have a wide spectrum of promising applications, motivating the interest of researchers worldwide. This paper offers a tutorial and an overview of the landscape of general strategies in visual content-based video indexing and retrieval, focusing on methods for video structure analysis, including shot boundary detection, key frame extraction and scene segmentation, extraction of features including static key frame features, object features and motion features, video data mining, video annotation, video retrieval including query interfaces, similarity measure and relevance feedback, and video browsing. Finally, we analyze future research directions.

FAQs

What recent methodologies improve shot boundary detection accuracy?add

The paper reveals that adaptive threshold-based algorithms significantly enhance shot boundary detection accuracy, outperforming global threshold methods by incorporating local content variations.

How do recent video classification approaches utilize semantic concepts?add

The research demonstrates that successful video classification increasingly employs machine learning techniques integrated with hierarchical semantic concepts, as showcased by Yuan et al.'s support cluster machines achieving notable accuracy.

What are common limitations in current key frame extraction techniques?add

The various algorithms for key frame extraction often struggle with redundancy and fail to effectively represent dynamic scene content, which restricts their applicability in complex video environments.

How does the integration of audio content enhance video summarization?add

Dynamic video skimming methods that include audio content provide a temporal evolution of the video, making the summary more engaging and informative, as highlighted in recent frameworks for news summarization.

What areas in video retrieval need further research according to this survey?add

Key areas identified for future research include motion feature analysis, hierarchical video indexing, and integrating multimodal human-computer interaction to enhance retrieval accuracy and user experience.

Figures (2)

References (279)

P. Turaga, A. Veeraraghavan, and R. Chellappa, "From videos to verbs: Mining videos for activities using a cascade of dynamical systems," in Proc. IEEE Conf. Comput. Vis. Pattern Recogn., Jun. 2007, pp. 1-8.
R. Hamid, S. Maddi, A. Bobick, and M. Essa, "Structure from statistics- Unsupervised activity analysis using suffix trees," in Proc. IEEE Int. Conf Comput. Vis., Oct., 2007, pp. 1-8.
G. Lavee, E. Rivlin, and M. Rudzsky, "Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video," IEEE Trans. Syst., Man, Cybern. C, Appl. Rev., vol. 39, no. 5, pp. 489-504, Sep. 2009.
J. Tang, X. S. Hua, M. Wang, Z. Gu, G. J. Qi, and X. Wu, "Correlative linear neighborhood propagation for video annotation," IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 39, no. 2, pp. 409-416, Apr. 2009.
X. Chen, C. Zhang, S. C. Chen, and S. Rubin, "A human-centered mul- tiple instance learning framework for semantic video retrieval," IEEE Trans. Syst, Man, Cybern., C: Appl. Rev., vol. 39, no. 2, pp. 228-233, Mar. 2009.
Y. Song, X.-S. Hua, L. Dai, and M. Wang, "Semi-automatic video anno- tation based on active learning with multiple complementary predictors," in Proc. ACM Int. Workshop Multimedia Inf. Retrieval, Singapore, 2005, pp. 97-104.
Y. Song, X.-S. Hua, G.-J. Qi, L.-R. Dai, M. Wang, and H.-J. Zhang, "Efficient semantic annotation method for indexing large personal video database," in Proc. ACM Int. Workshop Multimedia Inf. Retrieval, Santa Barbara, CA, 2006, pp. 289-296.
S. L. Feng, R. Manmatha, and V. Lavrenko, "Multiple Bernoulli relevance models for image and video annotation," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun./Jul. 2004, vol. 2, pp. 1002-1009.
Y. Song, G.-J. Qi, X.-S. Hua, L.-R. Dai, and R.-H. Wang, "Video annota- tion by active learning and semi-supervised ensembling," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2006, pp. 933-936.
C. H Yeo, Y. W. Zhu, Q. B. Sun, and S. F Chang, "A Framework for sub-window shot detection," in Proc. Int. Multimedia Modelling Conf., Jan. 2005, pp. 84-91.
G. Camara-Chavez, F. Precioso, M. Cord, S. Phillip-Foliguet, and A. de A. Araujo, "Shot boundary detection by a hierarchical supervised ap- proach," in Proc. Int. Conf. Syst., Signals Image Process., Jun. 2007, pp. 197-200.
H. Lu, Y.-P. Tan, X. Xue, and L. Wu, "Shot boundary detection us- ing unsupervised clustering and hypothesis testing," in Proc. Int. Conf. Commun. Circuits Syst., Jun. 2004, vol. 2, pp. 932-936.
R. Yan, M.-Y. Chen, and A. G. Hauptmann, "Mining relationship between video concepts using probabilistic graphical model," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2006, pp. 301-304.
R. Cutler and L. S. Davis, "Robust real-time periodic motion detection, analysis, and applications," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 781-796, Aug. 2000.
K. W. Sze, K. M. Lam, and G. P. Qiu, "A new key frame representation for video segment retrieval," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 9, pp. 1148-1155, Sep. 2005.
J. Yuan, H. Wang, L. Xiao, W. Zheng, J. Li, F. Lin, and B. Zhang, "A formal study of shot boundary detection," IEEE Trans. Cir- cuits Syst. Video Technol., vol. 17, no. 2, pp. 168-186, Feb. 2007.
R. Visser, N. Sebe, and E. M. Bakker, "Object recognition for video retrieval," in Proc. Int. Conf. Image Video Retrieval, London, U.K., Jul. 2002, pp. 262-270.
J. Sivic, M. Everingham, and A. Zisserman, "Person spotting: Video shot retrieval for face sets," in Proc. Int. Conf. Image Video Retrieval, Jul. 2005, pp. 226-236.
D.-D. Le, S. Satoh, and M. E. Houle, "Face retrieval in broadcasting news video by fusing temporal and intensity information," in Proc. Int. Conf. Image Video Retrieval, (Lect. Notes Comput. Sci.), 4071, Jul. 2006, pp. 391-400.
H. P. Li and D. Doermann, "Video indexing and retrieval based on recognized text," in Proc. IEEE Workshop Multimedia Signal Process., Dec. 2002, pp. 245-248.
K. Matsumoto, M. Naito, K. Hoashi, and F. Sugaya, "SVM-based shot boundary detection with a novel feature," in Proc. IEEE Int. Conf. Mul- timedia Expo., Jul. 2006, pp. 1837-1840.
M. S. Dao, F. G. B. DeNatale, and A. Massa, "Video retrieval using video object-trajectory and edge potential function," in Proc. Int. Symp. Intell. Multimedia, Video Speech Process., Oct. 2004, pp. 454-457.
Y. Yuan, "Research on video classification and retrieval," Ph.D. disserta- tion, School Electron. Inf. Eng., Xi'an Jiaotong Univ., Xi'an, China, pp. 5-27, 2003..
C. Yajima, Y. Nakanishi, and K. Tanaka, "Querying video data by spatio- temporal relationships of moving object traces," in Proc. Int. Federation Inform. Process. TC2/WG2.6 Working Conf. Visual Database Syst., Bris- bane, Australia, May 2002, pp. 357-371.
Y. K. Jung, K. W. Lee, and Y. S. Ho, "Content-based event retrieval using semantic scene interpretation for automated traffic surveillance," IEEE Trans. Intell. Transp. Syst., vol. 2, no. 3, pp. 151-163, Sep. 2001.
C.-W. Su, H.-Y. M. Liao, H.-R. Tyan, C.-W. Lin, D.-Y. Chen, and K.-C. Fan, "Motion flow-based video retrieval," IEEE Trans. Multimedia, vol. 9, no. 6, pp. 1193-1201, Oct. 2007.
J.-W. Hsieh, S.-L. Yu, and Y.-S. Chen, "Motion-based video retrieval by trajectory matching," IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 3, pp. 396-409, Mar. 2006.
T. Quack, V. Ferrari, and L. V. Gool, "Video mining with frequent item set configurations," in Proc. Int. Conf. Image Video Retrieval, 2006, pp. 360-369.
A. Hanjalic, R. Lienhart, W.-Y. Ma, and J. R. Smith, "The holy grail of multimedia information retrieval: So close or yet so far away?" Proc. IEEE, vol. 96, no. 4, pp. 541-547, Apr. 2008.
M. Worring, C. Snoek, O. de Rooij, G. P. Nguyen, and A. Smeulders, "The mediamill semantic video search engine," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2007, vol. 4, pp. IV.1213-IV.1216.
M. G. Christel and R. M. Conescu, "Mining novice user activity with TRECVID interactive retrieval tasks," in Proc. Int. Conf. Image Video Retrieval, Tempe, AZ, Jul. 2006, pp. 21-30.
J. Fan, H. Luo, Y. Gao, and R. Jain, "Incorporating concept ontology to boost hierarchical classifier training for automatic multi-level annota- tion," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 939-957, Aug. 2007.
R. Yan, A. G. Hauptmann, and R. Jin, "Negative pseudo-relevance feed- back in content-based video retrieval," in Proc. ACM Int. Conf. Multi- media, Berkeley, CA, Nov. 2003, pp. 343-346.
P. Browne and A. F. Smeaton, "Video retrieval using dialogue, keyframe similarity and video objects," in Proc. IEEE Int. Conf. Image Process., Sep. 2005, vol. 3, pp. 1208-1211.
R. Lienhart, "A system for effortless content annotation to unfold the semantics in videos," in Proc. IEEE Workshop Content-Based Access Image Video Libraries, Jun. 2000, pp. 45-49.
W. M. Hu, D. Xie, Z. Y. Fu, W. R. Zeng, and S. Maybank, "Semantic- based surveillance video retrieval," IEEE Trans. Image Process., vol. 16, no. 4, pp. 1168-1181, Apr. 2007.
W. N. Lie and W. C. Hsiao, "Content-based video retrieval based on object motion trajectory," in Proc. IEEE Workshop Multimedia Signal Process., Dec. 2002, pp. 237-240.
C. Snoek, M. Worring, and A. Smeulders, "Early versus late fusion in se- mantic video analysis," in Proc. ACM Int. Conf. Multimedia, Singapore, 2005, pp. 399-402.
B. T. Truong and S. Venkatesh, "Video abstraction: A systematic review and classification," ACM Trans. Multimedia Comput., Commun. Appl., vol. 3, no. 1, art. 3, pp. 1-37, Feb. 2007.
S. Bruyne, D. Deursen, J. Cock, W. Neve, P. Lambert, and R. Walle, "A compressed-domain approach for shot boundary detection on H.264/AVC bit streams," J. Signal Process.: Image Commun., vol. 23, no. 7, pp. 473-489, 2008.
C.-W. Ngo, "A robust dissolve detector by support vector machine," in Proc ACM Int. Conf. Multimedia, 2003, pp. 283-286.
V. Sudha, B. Shalabh, S. V. Basavaraja, and V. Sridhar, "SPSA-based feature relevance estimation for video retrieval," in Proc. IEEE Workshop Multimedia Signal Process., Cairns, Qld., Oct. 2008, pp. 598-603.
T. Joachims, "Optimizing search engines using clickthrough data," in Proc. ACM Conf. Knowl. Discovery Data Mining, Edmonton, AB, 2002, pp. 133-142.
M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, "Content-based multimedia information retrieval: State of the art and challenges," ACM Trans. Multimedia Comput., Commun. Appl., vol. 2, no. 1, pp. 1-19, Feb. 2006.
T. Mei, X. S. Hua, H. Q. Zhou, and S. P. Li, "Modeling and mining of users' capture intention for home videos," IEEE Trans. Multimedia, vol. 9, no. 1, pp. 66-76, Jan. 2007.
Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, "A generic framework of user attention model and its application in video summarization," IEEE Trans. Multimedia, vol. 7, no. 5, pp. 907-919, Oct. 2005.
Choi, K.-C. Ko, Y.-M. Cheon, G-Y. Kim, H-Il, S.-Y. Shin, and Y.-W. Rhee, "Video shot boundary detection algorithm," Comput. Vis., Graph. Image Process., (Lect. Notes Comput. Sci.), 4338, pp. 388-396, 2006.
C.-Y. Chen, J.-C. Wang, and J.-F. Wang, "Efficient news video querying and browsing based on distributed news video servers," IEEE Trans. Multimedia, vol. 8, no. 2, pp. 257-269, Apr. 2006.
L. L. Thi, A. Boucher, and M. Thonnat, "An interface for image retrieval and its extension to video retrieval," in Proc. Nat. Symp. Res., Develop. Appl. Inform. Commun. Technol., May 2006, pp. 278-285.
P. Muneesawang and L. Guan, "Automatic relevance feedback for video retrieval," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2003, vol. 3, pp. III.1-III.4.
L.-H. Chen, K.-H. Chin, and H.-Y. Liao, "An integrated approach to video retrieval," in Proc. ACM Conf. Australasian Database, vol. 75, Gold Coast, Australia, Dec. 2007, pp. 49-55.
P. Browne and A. F. Smeaton, "Video information retrieval using objects and ostensive relevance feedback," in Proc. ACM Symp. Appl. Comput., Nicosia, Cyprus, Mar. 2004, pp. 1084-1090.
H. Ghosh, P. Poornachander, A. Mallik, and S. Chaudhury, "Learning ontology for personalized video retrieval," in Proc. ACM Workshop Mul- timedia Inform., Retrieval, Augsburg, Germany, Sep. 2007, pp. 39-46.
S. Sav, H. Lee, A. F. Smeaton, N. O'Connor, and N. Murphy, "Using video objects and relevance feedback in video retrieval," in Proc. SPIE- Internet Multimedia Manag. Syst. VI, Boston, MA, Oct. 2005, vol. 6015, pp. 1-12.
S. Aksoy and O. Cavus, "A relevance feedback technique for multimodal retrieval of news videos," in Proc. Int. Conf. Comput. Tool, Nov. 2005, vol. 1, pp. 139-142.
S. Sav, H. Lee, N. O'Connor, and A. F. Smeaton, "Interactive object- based retrieval using relevance feedback," in Proc. Adv. Concepts Intell. Vis. Syst., (Lect. Notes Comput. Sci.), 3708, Oct. 2005, pp. 260-267.
U. Damnjanovic, E. Izquierdo, and M. Grzegorzek, "Shot boundary de- tection using spectral clustering," in Proc. Eur. Signal Process. Conf., Poznan, Poland, Sep. 2007, pp. 1779-1783.
X. Ling, L. Chao, H. Li, and X. Zhang, "A general method for shot boundary detection," in Proc. Int. Conf. Multimedia Ubiquitous Eng., 2008, pp. 394-397.
Y. Wu, Y. T. Zhuang, and Y. H. Pan, "Content-based video similarity model," in Proc. ACM Int. Conf. Multimedia, 2000, pp. 465-467.
H. Koumaras, G. Gardikis, G. Xilouris, E. Pallis, and A. Kourtis, "Shot boundary detection without threshold parameters," J. Electron. Imag., vol. 15, no. 2, pp. 020503-1-020503-3, May 2006.
Z.-C. Zhao, X. Zeng, T. Liu, and A.-N. Cai, "BUPT at TRECVID 2007: Shot boundary detection," in Proc. TREC Video Retrieval Eval., 2007, Available: http://www- nlpir.nist.gov/projects/tvpubs/tv7.papers/bupt.pdf.
F. Hopfgartner, J. Urban, R. Villa, and J. Jose, "Simulated testing of an adaptive multimedia information retrieval system," in Proc. Int. Work- shop Content-Based Multimedia Indexing, Bordeaux, France, Jun. 2007, pp. 328-335.
A. Herout, V. Beran, M. Hradis, I. Potucek, P. Zemcík, and P. Chmelar, "TRECVID 2007 by the Brno Group," in Proc. TREC Video Retrieval Eval., 2007. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv7.papers/brno.pdf
C. Lo and S.-J. Wang, "Video segmentation using a histogram-based fuzzy C-means clustering algorithm," in Proc. IEEE Int. Fuzzy Syst. Conf., Dec. 2001, pp. 920-923.
A. Hanjalic, "Shot-boundary detection: Unraveled and resolved?," IEEE Trans. Circuits Syst. Video Technol., vol. 12, no. 2, pp. 90-105, Feb. 2002.
A. F. Smeaton, "Techniques used and open challenges to the analysis, indexing and retrieval of digital video," Inform. Syst., vol. 32, no. 4, pp. 545-559, 2007.
Y. Y. Chung, W. K. J. Chin, X. Chen, D. Y. Shi, E. Choi, and F. Chen, "Content-based video retrieval system using wavelet transform," World Sci. Eng. Acad. Soc. Trans. Circuits Syst., vol. 6, no. 2, pp. 259-265, 2007.
L. Bai, S.-Y. Lao, H.-T. Liu, and J. Bu, "Video shot boundary detec- tion using petri-net," in Proc. Int. Conf. Mach. Learning Cybern., 2008, pp. 3047-3051.
M. R. Naphade and J. R. Smith, "On the detection of semantic concepts at TRECVID," in Proc. ACM Int. Conf. Multimedia, New York, 2004, pp. 660-667.
M. Christel, C. Huang, N. Moraveji, and N. Papernick, "Exploiting mul- tiple modalities for interactive video retrieval," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Montreal, QC, Canada, 2004, vol. 3, pp. 1032-1035.
L. Hollink, M. Worring, and A. T. Schreiber, "Building a visual ontology for video retrieval," in Proc. ACM Int. Conf. Multimedia, Singapore, 2005, pp. 479-482.
C. Liu, H. Liu, S. Jiang, Q. Huang, Y. Zheng, and W. Zhang, "JDL at TRECVID 2006 shot boundary detection," in Proc. TREC Video Retrieval Eval. Workshop, 2006. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv6.papers/cas_jdl.pdf
X. Shen, M. Boutell, J. Luo, and C. Brown, "Multi-label machine learning and its application to semantic scene classification," in Proc. Int. Symp. Electron. Imag., Jan. 2004, pp. 188-199.
R. Yan and M. Naphade, "Co-training non-robust classifiers for video semantic concept detection," in Proc. IEEE Int. Conf. Image Process., Singapore, 2005, vol. 1, pp. 1205-1208.
A. Hanjalic and L.-Q. Xu, "Affective video content representation and modeling," IEEE Trans. Multimedia, vol. 7, no. 1, pp. 143-154, Feb. 2005.
M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis, "Large-scale concept ontology for mul- timedia," IEEE Multimedia, vol. 13, no. 3, pp. 86-91, Jul./Sep. 2006.
X. Wu, P. C. Yuan, C. Liu, and J. Huang, "Shot boundary detection: An information saliency approach," in Proc. Congr. Image Signal Process., 2008, vol. 2, pp. 808-812.
R. V. Babu and K. R. Ramakrishnan, "Compressed domain video retrieval using object and global motion descriptors," Multimedia Tools Appl., vol. 32, no. 1, pp. 93-113, 2007.
J. Fan, H. Luo, Y. Gao, and R. Jain, "Incorporating concept ontology for hierarchical video classification, annotation, and visualization," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 939-957, 2007.
D. Vallet, P. Castells, M. Fernandez, P. Mylonas, and Y. Avrithis, "Person- alized content retrieval in context using ontological knowledge," IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 3, pp. 336-345, Mar. 2007.
A. Anjulan and N. Canagarajah, "A unified framework for object retrieval and mining," IEEE Trans. Circuits Syst. Video Technol., vol. 19, no. 1, pp. 63-76, Jan. 2009.
X. B. Gao, J. Li, and Y. Shi, "A video shot boundary detection algo- rithm based on feature tracking," in Proc. Int. Conf. Rough Sets Knowl. Technol., (Lect. Notes Comput. Sci.), 4062, 2006, pp. 651-658.
Y. Chang, D. J. Lee, Y. Hong, and J. Archibald, "Unsupervised video shot detection using clustering ensemble with a color global scale- invariant feature transform descriptor," EURASIP J. Image Video Pro- cess., vol. 2008, pp. 1-10, 2008.
G. C. Chavez, F. Precioso, M. Cord, S. P. -Foliguet, and A. de A. Araujo, "Shot boundary detection at TRECVID 2006," in Proc. TREC Video Retrieval Eval., 2006. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv6.papers/dokuz.pdf
Z.-C. Zhao and A.-N. Cai, "Shot boundary detection algorithm in com- pressed domain based on adaboost and fuzzy theory," in Proc. Int. Conf. Nat. Comput., 2006, pp. 617-626.
J. Sivic and A. Zisserman, "Video data mining using configurations of viewpoint invariant regions," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2004, vol. 1, pp. I-488-I-495.
C. H. Hoi, L. S. Wong, and A. Lyu, "Chinese university of Hong Kong at TRECVID 2006: Shot boundary detection and video search," in Proc. TREC Video Retrieval Eval., 2006. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv6.papers/chinese_uhk.pdf
H. L. Wang and L.-F. Cheong, "Affective understanding in film," IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 6, pp. 689-704, Jun. 2006.
S. Fischer, R. Lienhart, and W. Effelsberg, "Automatic recognition of film genres," in Proc. ACM Int. Conf. Multimedia, 1995, pp. 367-368.
B. T. Truong, C. Dorai, and S. Venkatesh, "Automatic genre identification for content-based video categorization," in Proc. IEEE Int. Conf. Pattern Recog., vol. 4, Barcelona, Spain, 2000, pp. 230-233.
N. Dimitrova, L. Agnihotri, and G. Wei, "Video classification based on HMM using text and faces," in Proc. Eur. Signal Process. Conf., Tampere, Finland, 2000, pp. 1373-1376.
G. Y. Hong, B. Fong, and A. Fong, "An intelligent video categorization engine," Kybernetes, vol. 34, no. 6, pp. 784-802, 2005.
C. G. M. Snoek, M. Worring, J.-M. Geusebroek, D. C. Koelma, F. J. Se- instra, and A. W. M. Smeulders, "The semantic pathfinder: Using an authoring metaphor for generic multimedia indexing," IEEE Trans. Pat- tern Anal. Mach. Intell., vol. 28, no. 10, pp. 1678-1689, Oct. 2006.
G. Xu,Y.-F. Ma, H.-J. Zhang, and Sh.-Q. Yang, "An HMM-based frame- work for video semantic analysis," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 11, pp. 1422-1433, Nov. 2005.
Y. T. Zhuang, C. M. Wu, F. Wu, and X. Liu, "Improving web-based learning: Automatic annotation of multimedia semantics and cross-media indexing," in Proc. Adv. Web-Based Learning -ICWL, (Lect. Notes Comput. Sci.), 3143, 2004, pp. 255-262.
Y. X. Peng and C.-W. Ngo, "Hot event detection and summarization by graph modeling and matching," in Proc. Int. Conf. Image Video Retrieval, Singapore, Jul. 2005, pp. 257-266.
G.-J. Qi, Y. Song, X.-S. Hua, H.-J. Zhang, and L.-R. Dai, "Video annota- tion by active learning and cluster tuning," in Proc. IEEE Conf. Comput. Vis. Pattern Recog. Workshop, Jun. 2006, pp. 114-121.
J. P. Fan, A. K. Elmagarmid, X. Q. Zhu, W. G. Aref, and L. D. Wu, "ClassView: Hierarchical video shot classification, indexing, and accessing," IEEE Trans. Multimedia, vol. 6, no. 1, pp. 70-86, Feb. 2004.
C. G. M. Snoek, M. Worring, D. C. Koelma, and A. W. M. Smeulders, "A learned lexicon-driven paradigm for interactive video retrieval," IEEE Trans. Multimedia, vol. 9, no. 2, pp. 280-292, Feb. 2007.
X. Q. Zhu, X. D. Wu, A. K. Elmagarmid, Z. Feng, and L. D. Wu, "Video data mining: Semantic indexing and event detection from the association perspective," IEEE Trans. Knowl. Data Eng., vol. 17, no. 5, pp. 665-677, May 2005.
V. Kules, V. A. Petrushin, and I. K. Sethi, "The perseus project: Creating personalized multimedia news portal," in Proc. Int. Workshop Multime- dia Data Mining, 2001, pp. 1-37.
J. Y. Pan and C. Faloutsos, "GeoPlot: Spatial data mining on video libraries," in Proc. Int. Conf. Inform. Knowl. Manag., 2002, pp. 405- 412.
Y.-X. Xie, X.-D. Luan, S.-Y. Lao, L.-D. Wu, X. Peng, and Z.-G. Han, "A news video mining method based on statistical analysis and visualiza- tion," in Proc. Int. Conf. Image Video Retrieva1, Jul. 2004, pp. 115-122.
J.-H. Oh and B. Bandi, "Multimedia data mining framework for raw video sequences," in Proc. ACM Int. Workshop Multimedia Data Mining, Edmonton, AB, Canada, 2002, pp. 18-35.
M. C. Burl, "Mining patterns of activity from video data," in Proc. SIAM Int. Conf. Data Mining, Apr. 2004, pp. 532-536.
M. Roach, J. Mason, L.-Q. Xu, and F. Stentiford, "Recent trends in video analysis: A taxonomy of video classification problems," in Proc. Int. Assoc. Sci. Technol. Develop. Int. Conf. Internet Multimedia Syst. Appl., Honolulu, HI, Aug. 2002, pp. 348-354.
W. S. Zhou, A. Vellaikal, and C.-C. J. Kuo, "Rule-based video classifi- cation system for basketball video indexing," in Proc. ACM Workshops Multimedia, 2000, pp. 213-216.
M. J. Roach, J. D. Mason, and M. Pawlewski, "Video genre classifica- tion using dynamics," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2001, vol. 3, pp. 1557-1560.
Y. Chen and E. K. Wong, "A knowledge-based approach to video content classification," in Proc. SPIE Vol. 4315: Storage and Retrieval for Media Databases, Jan. 2001, pp. 292-300.
W. S. Zhou, S. Dao, and C. C. J. Kuo, "On-line knowledge-and rule- based video classification system for video indexing and dissemination," Inform. Syst., vol. 27, no. 8, pp. 559-586, Dec. 2002.
P. Chang, M. Han, and Y. Gong, "Extract highlights from baseball game video with hidden Markov models," in Proc. IEEE Int. Conf. Image Process., 2002, vol. 1, pp. 609-612.
A. Mittal and L. F. Cheong, "Addressing the problems of Bayesian network classification of video using high dimensional features," IEEE Trans. Knowl. Data Eng., vol. 16, no. 2, pp. 230-244, Feb. 2004.
S. Chantamunee and Y. Gotoh, "University of Sheffield at TRECVID 2007: Shot boundary detection and rushes summarisation," in Proc. TREC Video Retrieval Eval., 2007. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv7.papers/sheffield_university.pdf
H. Pan, P. Van Beek, and M. I. Sezan, "Detection of slow-motion replay segments in sports video for highlights generation," in Proc. Int. Conf. Acoust., Speech, Signal Process., May 2001, pp. 1649-1652.
X. Yu, C. Xu, H. W. Leong, Q. Tian, Q. Tang, and K. Wan, "Trajectory- based ball detection and tracking with applications to semantic analysis of broadcast soccer video," in Proc. ACM Int. Conf. Multimedia, Berkeley, CA, 2003, pp. 11-20.
L. Y. Duan, M. Xu, Q. Tian, and C. Xu, "A unified framework for semantic shot classification in sports video," IEEE Trans. Multimedia, vol. 7, no. 6, pp. 1066-1083, Dec. 2005.
C. S. Xu, J. J. Wang, H. Q. Lu, and Y. F. Zhang, "A novel framework for semantic annotation and personalized retrieval of sports video," IEEE Trans. Multimedia, vol. 10, no. 3, pp. 421-436, Apr. 2008.
K. X. Dai, D. F. Wu, C. J. Fu, G. H. Li, and H. J. Li, "Video mining: A survey," J. Image Graph., vol. 11, no. 4, pp. 451-457, Apr. 2006.
M. Osadchy and D. Keren, "A rejection-based method for event detection in video," IEEE Trans. Circuits Syst. Video Technol., vol. 14, no. 4, pp. 534-541, Apr. 2004.
U. Park, H. Chen, and A. K. Jain, "3D model-assisted face recognition in video," in Proc. Workshop Face Process. Video, Victoria, BC, Canada, May 2005, pp. 322-329.
J. S. Boreczky and L. D. Wilcox, "A hidden Markov model framework for video segmentation using audio and image features," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1998, vol. 6, pp. 3741-3744.
M. J. Roach, J. S. D. Mason, and M. Pawlewski, "Motion-based clas- sification of cartoons," in Proc. Int. Symp. Intell. Multimedia, 2001, pp. 146-149.
Z. Rasheed, Y. Sheikh, and M. Shah, "On the use of computable features for film classification," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 1, pp. 52-64, Jan. 2005.
I. Laptev, M. Marszałek, C. Schmid, and B. Rozenfeld, "Learning re- alistic human actions from movies," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2008, pp. 1-8.
Y. Ke, R. Sukthankar, and M. Hebert, "Event detection in crowded videos," in Proc. IEEE Int. Conf. Comput. Vis., 2007, pp. 1-8.
H.-Y. Liu, T. T He, and Z. Hui, "Event detection in sports video based on multiple feature fusion," in Proc. Int. Conf. Fuzzy Syst. Knowl. Discovery, 2007, vol. 2, pp. 446-450.
Y. F. Zhang, C. S. Xu, Y. Rui, J. Q. Wang, and H. Q. Lu, "Semantic event extraction from basketball games using multi-modal analysis," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2007, pp. 2190-2193.
X. K. Li and F. M. Porikli, "A hidden Markov model framework for traffic event detection using video features," in Proc. IEEE Int. Conf. Image Process., Oct. 2004, vol. 5, pp. 2901-2904.
L. Xie, Q. Wu, X. M Chu, J. Wang, and P. Cao, "Traffic jam detection based on corner feature of background scene in video-based ITS," in Proc. IEEE Int. Conf. Netw., Sens. Control, Apr. 2008, pp. 614-619.
S. V. Nath, "Crime pattern detection using data mining," in Proc. IEEE/WIC/ACM Int. Conf. Web Intell. Intell. Agent Technol. Workshops, 2006, pp. 41-44.
H.-W. Yoo, H.-J. Ryoo, and D.-S. Jang, "Gradual shot boundary detection using localized edge blocks," Multimedia Tools, vol. 28, no. 3, pp. 283- 300, Mar. 2006.
Y. Zhai and M. Shah, "Video scene segmentation using Markov chain Monte Carlo," IEEE Trans. Multimedia, vol. 8, no. 4, pp. 686-697, Aug. 2006.
Z. Rasheed and M. Shah, "Scene detection in Hollywood movies and TV shows," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2003, vol. 2, pp. 343-350.
L. Zhao, W. Qi, Y.-J. Wang, S.-Q. Yang, and H.-J. Zhang, "Video shot grouping using best first model merging," in Proc. Storage Retrieval Media Database, 2001, pp. 262-269.
M. Wang, X.-S. Hua, X. Yuan, Y. Song, and L. R. Dai, "Optimizing multi-graph learning: Towards a unified video annotation scheme," in Proc. ACM Int. Conf. Multimedia, Augsburg, Germany, 2007, pp. 862- 871.
Z. Rasheed and M. Shah, "Detection and representation of scenes in videos," IEEE Trans. Multimedia, vol. 7, no. 6, pp. 1097-1105, Dec. 2005.
Y.-P. Tan and H. Lu, "Model-based clustering and analysis of video scenes," in Proc. IEEE Int. Conf. Image Process., Sep. 2002, vol. 1, pp. 617-620.
W. Tavanapong and J. Zhou, "Shot clustering techniques for story brows- ing," IEEE Trans. Multimedia, vol. 6, no. 4, pp. 517-527, Aug. 2004.
L.-H. Chen, Y.-C. Lai, and H.-Y. M. Liao, "Movie scene segmenta- tion using background information," Pattern Recognit., vol. 41, no. 3, pp. 1056-1065, Mar. 2008.
A. Hanjalic, R. L. Lagendijk, and J. Biemond, "Automated high-level movie segmentation for advanced video-retrieval systems," IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 4, pp. 580-588, Jun. 1999.
M. Wang, X.-S. Hua, Y. Song, X. Yuan, S. Li, and H.-J. Zhang, "Auto- matic video annotation by semi-supervised learning with kernel density estimation," in Proc. ACM Int. Conf. Multimedia, Santa Barbara, CA, 2006, pp. 967-976.
Y. Rui, T. S. Huang, and S. Mehrotra, "Constructing table-of-content for video," Multimedia Syst., vol. 7, no. 5, pp. 359-368, 1999.
X. Yuan, X.-S. Hua, M. Wang, and X. Wu, "Manifold-ranking based video concept detection on large database and feature pool," in Proc. ACM Int. Conf. Multimedia, Santa Barbara, CA, 2006, pp. 623-626.
C.-W. Ngo, T.-C. Pong, H.-J. Zhang, and R. T. Chin, "Motion-based video representation for scene change detection," Int. J. Comput. Vis., vol. 50, no. 2, pp. 127-142, 2002.
B. T. Truong, S. Venkatesh, and C. Dorai, "Scene extraction in motion pictures," IEEE Trans. Circuits Syst. Video Technol., vol. 13, no. 1, pp. 5-15, Jan. 2003.
R. Yan and M. Naphade, "Semi-supervised cross feature learning for semantic concept detection in videos," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jul. 2005, vol. 1, pp. 657-663.
H. Sundaram and S.-F. Chang, "Video scene segmentation using video and audio features," in Proc. IEEE Int. Conf. Multimedia Expo., New York, 2000, pp. 1145-1148.
N. Goela, K. Wilson, F. Niu, A. Divakaran, and I. Otsuka, "An SVM framework for genre-independent scene change detection," in Proc. IEEE Int. Conf. Multimedia Expo., vol. 3, New York, Jul. 2007, pp. 532-535.
Z. W. Gu, T. Mei, X. S. Hua, X. Q. Wu, and S. P. Li, "EMS: Energy minimization based video scene segmentation," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2007, pp. 520-523.
Y. Ariki, M. Kumano, and K. Tsukada, "Highlight scene extraction in real time from baseball live video," in Proc. ACM Int. Workshop Multimedia Inform. Retrieval, Berkeley, CA, Nov. 2003, pp. 209-214.
L. Xie, P. Xu, S.-F. Chang, A. Dirakaran, and H. Sun, "Structure analysis of soccer video with domain knowledge and hidden Markov models," Pattern Recognit. Lett., vol. 25, no. 7, pp. 767-775, 2004.
Y. Zhai, A. Yilmaz, and M. Shah, "Story segmentation in news using vi- sual and text cues," in Proc. Int. Conf. Image Video Retrieval, Singapore, Jul. 2005, pp. 92-102.
W. H.-M. Hsu and S.-F. Chang, "Generative, discriminative, and ensem- ble learning on multi-modal perceptual fusion toward news video story segmentation," in Proc. IEEE Int. Conf. Multimedia Expo., Jun. 2004, vol. 2, pp. 1091-1094.
J. Wu, X.-S. Hua, and H.-J. Zhang, "An online-optimized incremental learning framework for video semantic classification," in Proc. ACM Int. Conf. Multimedia, New York, Oct. 2004, pp. 320-323.
J. Yuan, J. Li, and B. Zhang, "Learning concepts from large scale imbal- anced data sets using support cluster machines," in Proc. ACM Int. Conf. Multimedia, Santa Barbara, CA, 2006, pp. 441-450.
I. Otsuka, K. Nakane, A. Divakaran, K. Hatanaka, and M. Ogawa, "A highlight scene detection and video summarization system using audio feature for a personal video recorder," IEEE Trans. Consum. Electron, vol. 51, no. 1, pp. 112-116, Feb. 2005.
K. Wan, X. Yan, and C. Xu, "Automatic mobile sports highlights," in Proc. IEEE Int. Conf. Multimedia Expo., 2005, pp. 638-641.
R.-G. Xiao, Y.-Y. Wang, H. Pan, and F. Wu, "Automatic video sum- marization by spatio-temporal analysis and non-trivial repeating pattern detection," in Proc. Congr. Image Signal Process., May 2008, vol. 4, pp. 555-559.
Y. H. Gong, "Summarizing audio-visual contents of a video program," EURASIP J. Appl. Signal Process., Special Issue Unstructured Inform. Manag. Multimedia Data Sources, vol. 2003, no. 2, pp. 160-169, Feb. 2003.
C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang, "Video summarization and scene detection by graph modeling," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 2, pp. 296-305, Feb. 2005.
M. Guironnet, D. Pellerin, N. Guyader, and P. Ladret, "Video sum- marization based on camera motion and a subjective evaluation method," EURASIP J. Image Video Process., vol. 2007, pp. 1-12, 2007.
J. Y. You, G. Z. Liu, L. Sun, and H. L Li, "A multiple visual models based perceptive analysis framework for multilevel video summariza- tion," IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 3, pp. 273- 285, Mar. 2007.
S. V. Porter, "Video segmentation and indexing using motion estimation," Ph.D. dissertation, Dept. Comput. Sci., Univ. Bristol, Bristol, U.K., 2004.
Z. Li, G. Schuster, and A. Katsaggelos, "Minmax optimal video sum- marization," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 10, pp. 1245-1256, Oct. 2005.
A. Ekin, A. M. Tekalp, and R. Mehrotra, "Automatic soccer video anal- ysis and summarization," IEEE Trans. Image Process., vol. 12, no. 7, pp. 796-807, Jul. 2003.
P. M. Fonseca and F. Pereira, "Automatic video summarization based on MPEG-7 descriptions," Signal Process.: Image Commun., vol. 19, no. 8, pp. 685-699, Sep. 2004.
R. Ewerth and B. Freisleben, "Semi-supervised learning for semantic video retrieval," in Proc. ACM Int. Conf. Image Video Retrieval, Ams- terdam, The Netherlands, Jul.2007, pp. 154-161.
W.-N. Lie and K.-C. Hsu, "Video summarization based on semantic feature analysis and user preference," in Proc. IEEE Int. Conf. Sens. Netw., Ubiquitous Trustworthy Comput., Jun. 2008, pp. 486-491.
X.-N. Xie and F. Wu, "Automatic video summarization by affinity prop- agation clustering and semantic content mining," in Proc. Int. Symp. Electron. Commerce Security, Aug. 2008, pp. 203-208.
D. Besiris, F. Fotopoulou, N. Laskaris, and G. Economou, "Key frame extraction in video sequences: A vantage points approach," in Proc. IEEE Workshop Multimedia Signal Process., Athens, Greece, Oct. 2007, pp. 434-437.
G. Ciocca and R. Schettini, "Supervised and unsupervised classification post-processing for visual video summaries," IEEE Trans. Consum. Electron., vol. 52, no. 2, pp. 630-638, May 2006.
Z. Li, G. M. Schuster, A. K. Katsaggelos, and B. Gandhi, "Rate-distortion optimal video summary generation," IEEE Trans. Image Process., vol. 14, no. 10, pp. 1550-1560, Oct. 2005.
I. Otsuka, K. Nakane, and A. Divakaran, "A highlight scene detection and video summarization system using audio feature for a personal video recorder," IEEE Trans. Consum. Electron., vol. 51, no. 1, pp. 112-116, 2005.
Y. Gao, W.-B. Wang, and J.-H. Yong, "A video summarization tool using two-level redundancy detection for personal video recorders," IEEE Trans. Consum. Electron., vol. 54, no. 2, pp. 521-526, May 2008.
J. Calic, D. Gibson, and N. Campbell, "Efficient layout of comic-like video summaries," IEEE Trans. Circuits Syst. Video Technol., vol. 17, no. 7, pp. 931-936, Jul. 2007.
F. Wang and C.-W. Ngo, "Rushes video summarization by object and event understanding," in Proc. Int. Workshop TREC Video Retrieval Eval. Video Summarization, Augsburg, Bavaria, Germany, 2007, pp. 25-29.
V. Valdes and J. M. Martinez, "On-line video summarization based on signature-based junk and redundancy filtering," in Proc. Int. Workshop Image Anal. Multimedia Interactive Services, 2008, pp. 88-91.
J. Kleban, A. Sarkar, E. Moxley, S. Mangiat, S. Joshi, T. Kuo, and B. S. Manjunath, "Feature fusion and redundancy pruning for rush video summarization," in Proc. Int. Workshop TREC Video Retrieval Eval. Video Summarization, Augsburg, Bavaria, Germany, 2007, pp. 84-88.
P. Over, A. F. Smeaton, and P. Kelly, "The TRECVID BBC rushes sum- marization evaluation pilot," in Proc. Int. Workshop TREC-VID Video Summarization, Augsburg, Bavaria, Germany, Sep. 2007.
Z. Cernekova, I. Pitas, and C. Nikou, "Information theory-based shot cut/fade detection and video summarization," IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 1, pp. 82-90, Jan. 2006.
Y. Li, S.-H. Lee, C.-H. Yeh, and C.-C. J. Kuo, "Techniques for movie content analysis and skimming: Tutorial and overview on video abstrac- tion techniques," IEEE Signal Process. Mag., vol. 23, no. 2, pp. 79-89, Mar. 2006.
P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based video summariza- tion using Delaunay clustering," Int. J. Digital Libraries, vol. 6, no. 2, pp. 219-232, Apr. 2006.
Z. Xiong, X. S. Zhou, Q. Tian, Y. Rui, and T. S. Huang, "Semantic re- trieval of videoReview of research on video retrieval in meetings, movies and broadcast news, and sports," IEEE Signal Process. Mag., vol. 23, no. 2, pp. 18-27, Mar. 2006.
C. M. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. Delp, "Auto- mated video program summarization using speech transcripts," IEEE Trans. Multimedia, vol. 8, no. 4, pp. 775-790, Aug. 2006.
C. Taskiran, J.-Y. Chen, A. Albiol, L. Torres, C. A. Bouman, and E. J. Delp, "Vibe: A compressed video database structured for active brows- ing and search," IEEE Trans. Multimedia, vol. 6, no. 1, pp. 103-118, Feb. 2004.
A. Aner, L. Tang, and J. R. Kender, "A method and browser for cross referenced video summaries," in Proc. IEEE Int. Conf. Multimedia Expo., Lausanne, Switzerland, Aug. 2002, vol. 2, pp. 237-240.
Y. L. Geng, D. Xu, and S. H Feng, "Hierarchical video summarization based on video structure and highlight," in Lecture Notes in Computer Science, vol. 4109. Berlin, Germany: Springer, 2006, pp. 226-234.
K. A. Peker, I. Otsuka, and A. Divakaran, "Broadcast video program summarization using face tracks," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2006, pp. 1053-1056.
C. Gianluigi and S. Raimondo, "An innovative algorithm for key frame extraction in video summarization," J. Real-Time Image Process., vol. 1, no. 1, pp. 69-88, Oct. 2006.
C. Choudary and T. C. Liu, "Summarization of visual content in instruc- tional videos," IEEE Trans. Multimedia, vol. 9, no. 7, pp. 1443-1455, Nov. 2007.
M. Cooper, T. Liu, and E. Rieffel, "Video segmentation via temporal pattern classification," IEEE Trans. Multimedia, vol. 9, no. 3, pp. 610- 618, Apr. 2007.
H.-W. Kang and X.-S. Hua, "To learn representativeness of video frames," in Proc. ACM Int. Conf. Multimedia, Singapore, 2005, pp. 423- 426.
D. P. Mukherjee, S. K. Das, and S. Saha, "Key frame estimation in video using randomness measure of feature point pattern," IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 5, pp. 612-620, May 2007.
L. J. Liu and G. L. Fan, "Combined key-frame extraction and object- based video segmentation," IEEE Trans. Circuits Syst. Video Technol., vol. 15, no. 7, pp. 869-884, Jul. 2005.
X. M. Song and G. L. Fan, "Joint key-frame extraction and object seg- mentation for content-based video analysis," IEEE Trans. Circuits Syst. Video Technol., vol. 16, no. 7, pp. 904-914, Jul. 2006.
J. Calic and B. Thomas, "Spatial analysis in key-frame extraction us- ing video segmentation," in Proc. Workshop Image Anal. Multimedia Interactive Services, Lisbon, Portugal, Apr. 2004.
C. Kim and J. Hwang, "Object-based video abstraction using cluster analysis," in Proc. IEEE Int. Conf. Image Process., Oct. 2001, vol. 2, pp. 657-660.
R. Yan and A. G. Hauptmann, "Probabilistic latent query analysis for combining multiple retrieval sources," in Proc. Ann. Int. ACM SIGIR Conf. Inform. Retrieval, Seattle, WA, 2006, pp. 324-331.
A. Girgensohn and J. Boreczky, "Time-constrained keyframe selection technique," Multimedia Tools Appl., vol. 11, no. 3, pp. 347-358, 2000.
X. D. Yu, L. Wang, Q. Tian, and P. Xue, "Multilevel video representa- tion with application to keyframe extraction," in Proc. Int. Multimedia Modelling Conf., 2004, pp. 117-123.
D. Gibson, N. Campbell, and B. Thomas, "Visual abstraction of wildlife footage using Gaussian mixture models and the minimum description length criterion," in Proc. IEEE Int. Conf. Pattern Recog., Dec. 2002, vol. 2, pp. 814-817.
T. Wang, Y. Wu, and L. Chen, "An approach to video key-frame extraction based on rough set," in Proc. Int. Conf. Multimedia Ubiquitous Eng., 2007.
T. M Liu, H.-J. Zhang, and F. H. Qi, "A novel video key-frame-extraction algorithm based on perceived motion energy model," IEEE Trans. Cir- cuits Syst. Video Technol., vol. 13, no. 10, pp. 1006-1013, Oct. 2003.
A. M. Ferman and A. M. Tekalp, "Two-stage hierarchical video summary extraction to match low-level user browsing preferences," IEEE Trans. Multimedia, vol. 5, no. 2, pp. 244-256, Jun. 2003.
Z. H. Sun, K. B. Jia, and H. X. Chen, "Video key frame extraction based on spatial-temporal color distribution," in Proc. Int. Conf. Intell. Inform. Hiding Multimedia Signal Process., 2008, p. 196-199.
R. Narasimha, A. Savakis, R. M. Rao, and R. De Queiroz, "Key frame extraction using MPEG-7 motion descriptors," in Proc. Asilomar Conf. Signals, Syst. Comput., Nov. 2003, vol. 2, pp. 1575-1579.
D. Xia, X. Deng, and Q. Zeng, "Shot boundary detection based on difference sequences of mutual information," in Proc. Int. Conf. Image Graph., Aug. 2007, pp. 389-394.
B. Fauvet, P. Bouthemy, P. Gros, and F. Spindler, "A geometrical key- frame selection method exploiting dominant motion estimation in video," in Proc. Int. Conf. Image Video Retrieval, Jul. 2004, pp. 419-427.
H. J. Zhang, J. Wu, D. Zhong, and S. W. Smoliar, "An integrated sys- tem for content-based video retrieval and browsing," Pattern Recognit., vol. 30, no. 4, pp. 643-658, 1997.
X.-D. Zhang, T.-Y. Liu, K.-T. Lo, and J. Feng, "Dynamic selection and effective compression of key frames for video abstraction," Pattern Recognit. Lett., vol. 24, no. 9-10, pp. 1523-1532, Jun. 2003.
A. Divakaran, R. Radhakrishnan, and K. A Peker, "Motion activity- based extraction of key-frames from video shots," in Proc. IEEE Int. Conf. Image Process., 2002, vol. 1, Rochester, NY, pp. 932-935.
J. Rong, W. Jin, and L. Wu, "Key frame extraction using inter-shot information," in Proc. IEEE Int. Conf. Multimedia Expo., Jun. 2004, pp. 571-574.
M. Cooper and J. Foote, "Discriminative techniques for keyframe selec- tion," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2005, pp. 502-505.
H. S. Chang, S. Sull, and S. U. Lee, "Efficient video indexing scheme for content-based retrieval," IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 8, pp. 1269-1279, Dec. 1999.
S. V. Porter, M. Mirmehdi, and B. T. Thomas, "A shortest path rep- resentation for video summarization," in Proc. Int. Conf. Image Anal. Process., Sep. 2003, pp. 460-465.
H.-C. Lee and S.-D. Kim, "Iterative key frame selection in the rate- constraint environment," Signal Process. Image Commun., vol. 18, no. 1, pp. 1-15, 2003.
T. Liu, X. Zhang, J. Feng, and K. Lo, "Shot reconstruction degree: A novel criterion for key frame selection," Pattern Recognit. Lett., vol. 25, no. 12, pp. 1451-1457, Sep. 2004.
J. Calic and E. Izquierdo, "Efficient key-frame extraction and video analysis," in Proc. Int. Conf. Inf. Technol.: Coding Comput., Apr. 2002, pp. 28-33.
J. Yuan, L. Xiao, D. Wang, D. Ding, Y. Zuo, Z. Tong, X. Liu, S. Xu, W. Zheng, X. Li, Z. Si, J. Li, F. Lin, and B. Zhang, "Tsinghua University at TRECVID 2005," in Proc. TREC Video Retrieval Eval., Gaithersburg, MD, 2005. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv5.papers/tsinghua.pdf
S.-H. Han and I.-S. Kweon, "Scalable temporal interest points for ab- straction and classification of video events," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2005, pp. 1-4.
S. Y. Neo, J. Zhao, M. Y. Kan, and T. S. Chua, "Video retrieval using high level features: Exploiting query matching and confidence-based weight- ing," in Proc. Conf. Image Video Retrieval, Singapore, 2006, pp. 370- 379.
A. Amir, W. Hsu, G. Iyengar, C. Y. Lin, M. Naphade, A. Natsev, C. Neti, H. J. Nock, J. R. Smith, B. L. Tseng, Y. Wu, and D. Zhang, "IBM research TRECVID-2003 video retrieval system," in Proc. TREC Video Retrieval Eval., Gaithersburg, MD, 2003. Available: http://www- nlpir.nist.gov/projects/tvpubs/tvpapers03/ibm.smith.paper.final2.pdf
A. G. Hauptmann, R. Baron, M. Y. Chen, M. Christel, P. Duygulu, C. Huang, R. Jin, W. H. Lin, T. Ng, N. Moraveji, N. Papernick, C. Snoek, G. Tzanetakis, J. Yang, R. Yan, and H. Wactlar, "Infor- media at TRECVID 2003: Analyzing and searching broadcast news video," in Proc. TREC Video Retrieval Eval., Gaithersburg, MD, 2003. Available: http://www-nlpir.nist.gov/projects/tvpubs/tvpapers03/ cmu.final.paper.pdf
C. Foley, C. Gurrin, G. Jones, H. Lee, S. McGivney, N. E. O'Connor, S. Sav, A. F. Smeaton, and P. Wilkins, "TRECVID 2005 ex- periments at Dublin city university," in Proc. TREC Video Re- trieval Eval., Gaithersburg, MD, 2005. Available: http://www- nlpir.nist.gov/projects/tvpubs/tv5.papers/dcu.pdf
E. Cooke, P. Ferguson, G. Gaughan, C. Gurrin, G. Jones, H. L. Borgue, H. Lee, S. Marlow, K. McDonald, M. McHugh, N. Murphy, N. O'Connor, N. O'Hare, S. Rothwell, A. Smeaton, and P. Wilkins, "TRECVID 2004 experiments in Dublin city university," in Proc. TREC Video Retrieval Eval., Gaithersburg, MD, 2004. Available: http://www- nlpir.nist.gov/projects/tvpubs/tvpapers04/dcu.pdf
J. Adcock, A. Girgensohn, M. Cooper, T. Liu, L. Wilcox, and E. Ri- effel, "FXPAL experiments for TRECVID 2004," in Proc. TREC Video Retrieval Eval., Gaithersburg, MD, 2004. Available: http://www- nlpir.nist.gov/projects/tvpubs/tvpapers04/fxpal.pdf
T. Volkmer and A. Natsev, "Exploring automatic query refinement for text-based video retrieval," in Proc. IEEE Int. Conf. Multimedia Expo., Toronto, 2006, pp. 765-768.
A. Hauptmann, M. Y. Chen, M. Christel, C. Huang, W. H. Lin, T. Ng, N. Papernick, A. Velivelli, J. Yang, R. Yan, H. Yang, and H. D. Wact- lar, "Confounded expectations: Informedia at TRECVID 2004," in Proc. TREC Video Retrieval Eval., Gaithersburg, MD, 2004. Available: http://www-nlpir.nist.gov/projects/tvpubs/tvpapers04/cmu.pdf
R. Yan and A. G. Hauptmann, "A review of text and image retrieval ap- proaches for broadcast news video," Inform. Retrieval, vol. 10, pp. 445- 484, 2007.
L. X. Xie, H. Sundaram, and M. Campbell, "Event mining in multimedia streams," Proc. IEEE, vol. 96, no. 4, pp. 623-646, Apr. 2008.
K.-H. Liu, M.-F. Weng, C.-Y. Tseng, Y.-Y. Chuang, and M.-S. Chen, "Association and temporal rule mining for post-filtering of semantic concept detection in video," IEEE Trans. Multimedia, vol. 10, no. 2, pp. 240-251, Feb. 2008.
M.-L. Shyu, Z. Xie, M. Chen, and S.-C. Chen, "Video semantic event/concept detection using a subspace-based multimedia data min- ing framework," IEEE Trans. Multimedia, vol. 10, no. 2, pp. 252-259, Feb. 2008.
R. Fablet, P. Bouthemy, and P. Perez, "Nonparametric motion characteri- zation using causal probabilistic models for video indexing and retrieval," IEEE Trans. Image Process., vol. 11, no. 4, pp. 393-407, Apr. 2002.
Y.-F. Ma and H.-J. Zhang, "Motion texture: A new motion based video representation," in Proc. Int. Conf. Pattern Recog., Aug. 2002, vol. 2, pp. 548-551.
A. D. Bimbo, E. Vicario, and D. Zingoni, "Symbolic description and visual querying of image sequences using spatiotemporal logic," IEEE Trans. Knowl. Data Eng., vol. 7, pp. 609-622, Aug. 1995.
S. F. Chang, W. Chen, H. J. Meng, H. Sundaram, and D. Zhong, "A fully automated content-based video search engine supporting spatiotemporal queries," IEEE Trans. Circuits Syst. Video Technol., vol. 8, no. 5, pp. 602- 615, Sep. 1998.
F. I. Bashir, A. A. Khokhar, and D. Schonfeld, "Real-time motion trajectory-based indexing and retrieval of video sequences," IEEE Trans. Multimedia, vol. 9, no. 1, pp. 58-65, Jan. 2007.
W. Chen and S. F. Chang, "Motion trajectory matching of video objects," in Proc. SPIE vol. 3972: Storage and Retrieval for Media Databases, Jan. 2000, pp. 544-553.
L. Yang, J. Liu, X. Yang, and X. Hua, "Multi-modality web video cat- egorization," in Proc. ACM SIGMM Int. Workshop Multimedia Inform. Retrieval, Augsburg, Germany, Sep. 2007, pp. 265-274.
X. Yuan, W. Lai, T. Mei, X.-S. Hua, and X.-Q. Wu, "Automatic video genre categorization using hierarchical SVM," in Proc. IEEE Int. Conf. Image Process., Atlanta, GA, Oct. 2006, pp. 2905-2908.
C. G. M. Snoek, M. Worring, J. C. van Gemert, J. M. Geusebroek, and A. W. M. Smeulders, "The challenge problem for automated detection of 101 semantic concepts in multimedia," in Proc. ACM Int. Conf. Multimedia, Santa Barbara, CA, 2006, p. 421-430.
C. G. M. Snoek, B. Huurnink, L. Hollink, M. de Rijke, G. Schreiber, and M. Worring, "Adding semantics to detectors for video retrieval," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 975-985, Aug. 2007.
A. Hauptmann, R. Yan, W.-H. Lin, M. Christel, and H. Wactlar, "Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news," IEEE Trans. Multimedia, vol. 9, no. 5, pp. 958- 966, Aug. 2007.
G.-J. Qi, X.-S. Hua, Y. Rui, J. H. Tang, T. Mei, and H.-J. Zhang, "Correl- ative multi-label video annotation," in Proc. ACM Int. Conf. Multimedia, Augsburg, Germany, 2007, pp. 17-26.
D. Brezeale and D. J. Cook, "Automatic video classification: A survey of the literature," IEEE Trans. Syst., Man, Cybern., C, Appl. Rev., vol. 38, no. 3, pp. 416-430, May 2008.
P. Xu, L. Xie, S.-F. Chang, A. Divakaran, A. Vetro, and H. Sun, "Al- gorithms and system for segmentation and structure analysis in soccer video," in Proc. IEEE Int. Conf. Multimedia Expo., Tokyo, Japan, 2001, pp. 928-931.
Y. P. Tan, D. D. Saur, S. R. Kulkarni, and P. J. Ramadge, "Rapid estimation of camera motion from compressed video with applications to video annotation," IEEE Trans. Circuits Syst. Video Technol., vol. 10, no. 1, pp. 133-146, Feb. 2000.
Y. Wu, B. L. Tseng, and J. R. Smith, "Ontology-based multi-classification learning for video concept detection," in Proc. IEEE Int. Conf. Multime- dia Expo., Jun. 2004, vol. 2, pp. 1003-1006.
J. R. Smith and M. Naphade, "Multimedia semantic indexing using model vectors," in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2003, vol. 2, pp. 445-448.
W. Jiang, S.-F. Chang, and A. Loui, "Active concept-based concept fusion with partial user labels," in Proc. IEEE Int. Conf. Image Process., Oct. 2006, pp. 2917-2920.
M. Bertini, A. Del Bimbo, and C. Torniai, "Automatic video annotation using ontologies extended with visual information," in Proc. ACM Int. Conf. Multimedia, Singapore, Nov. 2005, pp. 395-398.
A. G. Hauptmann, M. Christel, and R. Yan, "Video retrieval based on semantic concepts," Proc. IEEE, vol. 96, no. 4, pp. 602-622, Apr. 2008.
J. Fan, H. Luo, and A. K. Elmagarmid, "Concept-oriented indexing of video databases: Towards semantic sensitive retrieval and brows- ing," IEEE Trans. Image Process., vol. 13, no. 7, pp. 974-992, Jul. 2004.
F. Pereira, A. Vetro, and T. Sikora, "Multimedia retrieval and delivery: Essential metadata challenges and standards," Proc. IEEE, vol. 96, no. 4, pp. 721-744, Apr. 2008.
Y. Aytar, M. Shah, and J. B. Luo, "Utilizing semantic word similarity measures for video retrieval," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2008, pp. 1-8.
C. G. M. Snoek and M. Worring, "Multimodal video indexing: A review of the state-of-the-art," Multimedia Tools Appl., vol. 25, no. 1, pp. 5-35, Jan. 2005.
S.-F. Chang, W.-Y. Ma, and A. Smeulders, "Recent advances and challenges of semantic image/video search," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2007, vol. 4, pp. IV-1205-IV- 1208.
R. Yan, J. Yang, and A. G. Hauptmann, "Learning query-class dependent weights in automatic video retrieval," in Proc. ACM Int. Conf. Multime- dia, New York, Oct. 2004, pp. 548-555.
L. Kennedy, P. Natsev, and S.-F. Chang, "Automatic discovery of query class dependent models for multimodal search," in Proc. ACM Int. Conf. Multimedia, Singapore, Nov. 2005, pp. 882-891.
P. Over, G. Awad, J. Fiscus, and A. F. Smeaton. (2010). "TRECVID 2009-Goals, tasks, data, evaluation mechanisms and metrics," [Online]. Available: http://www-nlpir.nist.gov/projects/ tvpubs/tv.pubs.org.html
K. Schoeffmann, F. Hopfgartner, O. Marques, L. Boeszoermenyi, and J. M. Jose, "Video browsing interfaces and applications: A review," SPIE Rev., vol. 1, no. 1, pp. 018004.1-018004.35, May 2010.
C. G. M. Snoek and M. Worring, "Concept-based video retrieval," Foun- dations Trends Inform. Retrieval, vol. 2, no. 4, pp. 215-322, 2009.
A. F. Smeaton, P. Over, and A. R. Doherty, "Video shot boundary de- tection: Seven years of TRECVid activity," Comput. Vis. Image Under- standing, vol. 114, no. 4, pp. 411-418, 2010.
G. Quenot, D. Moraru, and L. Besacier. (2003). "CLIPS at TRECVID: Shot boundary detection and feature detec- tion," in Proc. TREC Video Retrieval Eval. Workshop Note- book Papers [Online]. Available: http://www-nlpir.nist.gov/projects/ tvpubs/tv.pubs.org.html#2003.
P. Over, T. Ianeva, W. Kraaij, and A. F. Smeaton. (2005). "TRECVID 2005-An overview," in Proc. TREC Video Re- trieval Eval. Workshop. [Online]. Available: http://www-nlpir.nist. gov/projects/tvpubs/tv.pubs.org.html#2005.
A. F. Smeaton, P. Over, and W. Kraaij, "High-level feature detection from video in TRECVID: A 5-year retrospective of achievements," Mul- timedia Content Analysis: Theory and Applications (Springer Series on Signals and Communication Technology) Berlin, Germany: Springer, 2009, pp. 151-174.
J. Sivic and A. Zisserman, "Video Google: Efficient visual search of videos," in Toward Category-Level Object Recognition.. Berlin, Ger- many: Springer, 2006, pp. 127-144.
M. Chen, M. Christel, A. Hauptmann, and H. Wactlar, "Putting active learning into multimedia applications: Dynamic definition and refine- ment of concept classifiers," in Proc. ACM Int. Conf. Multimedia, 2005, pp. 902-911.
H. B. Luan, S. Y. Neo, H. K. Goh, Y. D. Zhang, S. X. Lin, and T. S. Chua, "Segregated feedback with performance-based adaptive sampling for interactive news video retrieval," in Proc. ACM Int. Conf. Multimedia, 2007, pp. 293-296.
G. P. Nguyen, M. Worring, and A. W. M. Smeulders, "Interactive search by direct manipulation of dissimilarity space," IEEE Trans. Multimedia, vol. 9, no. 7, pp. 1404-1415, Jun. 2007.
E. Bruno, N. Moenne-Loccoz, and S. Marchand-Maillet, "Design of multimodal dissimilarity spaces for retrieval of video documents," IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 9, pp. 1520-1533, Sep. 2008.
P. Over, A. F. Smeaton, and G. Awad, "The TRECVid 2008 BBC rushes summarization evaluation," in Proc. 2nd ACM TREC Video Retrieval Eval. Video Summarization Workshop, 2008, pp. 1-20.
W. Bailer and G. Thallinger, "Comparison of content selection methods for skimming rushes video," in Proc. 2nd ACM TREC Video Retrieval Eval. Video Summarization Workshop, 2008, pp. 85-89.
Z. Liu, E. Zavesky, B. Shahraray, D. Gibbon, and A. Basso, "Brief and high-interest video summary generation: Evaluating the AT&T labs rushes summarizations," in Proc. 2nd ACM TREC Video Retrieval Eval. Video Summarization Workshop, 2008, pp. 21-25.
V. Chasanis, A. Likas, and N. Galatsanos, "Video rushes summariza- tion using spectral clustering and sequence alignment," in Proc. 2nd ACM TREC Video Retrieval Eval. Video Summarization Workshop, 2008, pp. 75-79.
M. G. Christel, A. G. Hauptmann, W.-H. Lin, M.-Y. Chen, B. Maher, and R. V. Baron, "Exploring the utility of fast-forward surrogates for BBC rushes," in Proc. 2nd ACM TREC Video Retrieval Eval. Video Summarization Workshop, 2008, pp. 35-39.
S. Naci, U. Damnjanovic, B. Mansencal, J. Benois-Pineau, C. Kaes, and M. Corvaglia, "The COST292 experimental framework for RUSHES task in TRECVID 2008," in Proc. 2nd ACM TREC Video Retrieval Eval. Video Summarization Workshop, 2008, pp. 40-44.
W. Ren, S. Singh, M. Singh, and Y. S. Zhu, "State-of-the-art on spatio- temporal information-based video retrieval," Pattern Recognit., vol. 42, no. 2, pp. 267-282, Feb. 2009.
M. Wang, X. S. Hua, J. Tang, and R. Hong, "Beyond distance measure- ment: Constructing neighborhood similarity for video annotation," IEEE Trans. Multimedia, vol. 11, no. 3, pp. 465-476, Apr. 2009.

A Survey on Visual Content-Based Video Indexing and Retrieval

Sign up for access to the world's latest research

Abstract

FAQs

Related papers

References (279)

Related papers

Related topics

Cited by