Academia.eduAcademia.edu

Outline

Dynamic Multimodal Fusion in Video Search

2007, Multimedia and Expo, 2007 IEEE International Conference on

https://doi.org/10.1109/ICME.2007.4284946

Abstract

We propose effective multimodal fusion strategies for video search. Multimodal search is a widely applicable information-retrieval problem, and fusion strategies are essential to the system in order to utilize all available retrieval experts and to boost the performance. Prior work has focused on hard-and soft-modeling of query classes and learning weights for each class, while the class partition is either manually defined or learned from data but still insensitive to the testing query. We propose a query-dependent fusion strategy that dynamically generates a class among the training queries that are closest to the testing query, based on light-weight query features defined on the outcome of semantic analysis on the query text. A set of optimal weights are then learned on the dynamic class, which aims to model both the co-occurring query features and unusual test queries. Used in conjunction with the rest of our multimodal retrieval system, dynamic query classes performs favorably with hard and soft query classes, and the system performance improves upon the best automatic search run of TRECVID05 and TRECVID06 by 34% and 8%, respectively.

References (11)

  1. REFERENCES
  2. E. A. Fox and J. A. Shaw, "Combination of multiple searches," in Proc. TREC-2, pp. 243-249, 1994.
  3. R. Yan, J. Yang, and A. G. Hauptmann, "Learning query-class dependent weights in automatic video retrieval," in ACM Mul- timedia '04, pp. 548-555, 2004.
  4. T.-S. Chua, S.-Y. Neo, K.-Y. Li, G. Wang, R. Shi, M. Zhao, and H. Xu, "TRECVID 2004 search and feature extraction task by NUS PRIS," in NIST TRECVID Workshop, November 2004.
  5. L. S. Kennedy, A. P. Natsev, and S.-F. Chang, "Automatic discovery of query-class-dependent models for multimodal search," in ACM Multimedia '05, pp. 882-891, 2005.
  6. R. Yan and A. G. Hauptmann, "Probabilistic latent query analysis for combining multiple retrieval sources," in SIGIR '06, pp. 324-331, 2006.
  7. J. Chu-Carroll, P. A. Duboue, J. M. Prager, and K. Czuba, "Ibm's piquant ii in trec 2005," in Proc. Fourthteen Text RE- trieval Conference Proceedings (TREC 2005), 2005.
  8. M. Campbell, S. Ebadollahi, M. Naphade, A. P. Natsev, J. R. Smith, J. Tesic, L. Xie, K. Scheinberg, J. Seidl, and A. Haubold, "IBM research trecvid-2006 video retrieval sys- tem," in NIST TRECVID Workshop, November 2006.
  9. Y. Mass, M. Mandelbrod, E. Amitay, D. Carmel, Y. Maarek, and A. Soffer, "JuruXML-an XML retrieval system," in INEX '02, (Schloss Dagstuhl, Germany), Dec. 2002.
  10. A. Natsev, M. R. Naphade, and J. Tešić, "Learning the seman- tics of multimedia queries and concepts from a small number of examples," in ACM Multimedia, (Singapore), 2005.
  11. The National Institute of Standards and Technology (NIST), "TREC video retrieval evaluation," 2001-2006. http://www- nlpir.nist.gov/projects/trecvid/.