Academia.eduAcademia.edu

Outline

On linear mixture of expert approaches to information retrieval

2006, Decision Support Systems

https://doi.org/10.1016/J.DSS.2004.11.014

Abstract

Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document collections, it has become increasingly difficult to retrieve relevant information from these large document collections. Information Retrieval systems help users identify relevant documents for their information needs. Matching functions match the information in documents with that required by users in terms of queries to produce a set of documents to be presented to the users. It is well known that a single matching function does not produce the best retrieval results for all contexts (documents and queries). In this paper we combine the results obtained from well known matching functions in the literature. We employ Genetic Algorithms to do such combinations and test our method using a large well known document dataset. It is observed that our method produces better retrieval results for both the consensus search and the routing tasks in information retrieval.

References (36)

  1. B. Bartell, G. Cottrell, R.K. Belew, Automatic combination of multiple ranked retrieval systems, Proceedings of the 17th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR), Dublin, 1994, pp. 173 -181.
  2. B. Bartell, G. Cottrell, R.K. Belew, Optimizing similarity using multi-query relevance feedback, Journal of the American Society for Information Science 49 (1998) 742 -761.
  3. N. Belkin, C. Cool, W.B. Croft, J.P. Callan, The effect of multiple query representations on information retrieval per- formance, Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Informa- tion Retrieval, 1993, pp. 339 -346.
  4. N.J. Belkin, P. Kantor, E.A. Fox, J.A. Shaw, Combining the evidence of multiple query representations for information retrieval, Information Processing & Management 31 (1995) 431 -448.
  5. S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems 30 (1998) 107 -117.
  6. W. Fan, E.A. Fox, P. Pathak, H. Wu, The effects of fitness functions on genetic programming-based ranking discovery for web search, Journal of the American Society for Information Science and Technology 55 (2004) 628 -636.
  7. W. Fan, M. Gordon, P. Pathak, Discovery of context-specific ranking functions for effective information retrieval using genetic programming, IEEE Transactions on Knowledge and Data Engineering 16 (2004) 523 -527.
  8. E.A. Fox, J.A. Shaw, Combination of multiple searches, Proceedings of the 2nd Text Retrieval Conference (TREC-2), NIST, vol. 500-215, 1994, pp. 243 -252.
  9. J. Gao, G. Cao, H. He, M. Zhang, J. Nie, S. Walker, S.E. Robertson, TREC-10 web track experiments at MSRA, in: E. Voorhees, D. Harman (Eds.), Tenth text retrieval conference, NIST Special Publication, 2002, pp. 384 -392.
  10. D.E. Goldberg, Genetic algorithms in search, optimization and machine learning, Addison-Wesley, 1989.
  11. M. Gordon, P. Pathak, Finding information on the world wide web: the retrieval effectiveness of search engines, Information Processing & Management 35 (1999) 141 -180.
  12. D. Hawking, Overview of the TREC-9 web track, in: E. Voorhees, D. Harman (Eds.), Ninth Text Retrieval Conference, NIST Special Publication, vol. 500-249, 2000, pp. 86 -102.
  13. D. Hawking, N. Craswell, Overview of the TREC-2001 web track, in: E. Voorhees, D.K. Harman (Eds.), Proceedings of the Tenth Text Retrieval Conference, NIST, vol. 500-250, 2001, pp. 61 -67.
  14. F. Herrera, M. Lozano, J. Verdegay, Tackling real-coded genetic algorithms: operators and tools for the behaviour analysis, Artificial Intelligence Review 12 (1998) 265 -319.
  15. J.H. Holland, Adaptation in natural and artificial systems, 2nd ed., MIT Press, 1992.
  16. K. Jarvelin, J. Kekalainen, IR evaluation methods for retrieving highly relevant documents, Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, 2001, pp. 41 -48.
  17. W.P. Jones, G.W. Furnas, Pictures of relevance: a geometric analysis of similarity measures, Journal of the American Society for Information Science 38 (1987) 420 -442.
  18. S. Kirkpatrick, C.D. Gelatt, M.P. Vecchi, Optimization by simulated annealing, Science 220 (1983) 671 -680.
  19. J.M. Kleinberg, Authoritative sources in a hyperlinked environment, Journal of the Association for Computing Machinery 46 (1999) 604.
  20. F.W. Lancaster, A.J. Warner, Information retrieval today, Information Resources Press, 1993.
  21. J. Lee, Combining multiple evidence from different properties of weighting schemes, Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1995, pp. 180 -188.
  22. J. Lee, Analysis of multiple evidence combination, Proceed- ings of Twentieth Annual International ACM SIGIR Confer- ence on Research and Development in Information Retrieval, 1997, pp. 267 -276.
  23. C. Lopez-Pujalet, V.P. Guerrero-Bote, F.d. Moya-Anegon, Order-based fitness functions for genetic algorithms applied to relevance feedback, Journal of the American Society for Information Science and Technology 54 (2003) 152.
  24. C. Lopez-Pujalte, V.P.G. Bote, F.d.M. Anegon, A test of genetic algorithms in relevance feedback, Information Pro- cessing & Management 38 (2002) 793 -805.
  25. C. Lopez-Pujalte, V.P. Guerrero-Bote, F. de Moya-Anegon, Genetic algorithms in relevance feedback: a second test and new contributions, Information Processing & Management 39 (2003) 669 -687.
  26. C. McCabe, A. Chowdhury, D. Grossman, O. Frieder, A unified environment for fusion of information retrieval approaches, Proceedings of the Eighth International Confe- rence on Information Knowledge Management (CIKM), Kansas City, ACM, New York, 1999, pp. 330 -342.
  27. Z. Michalewcz, Genetic algorithms+data structures=evolution programs, Springer-Verlag, Berlin, 1995.
  28. P. Pathak, Use of Genetic Algorithms in Information Retrieval: Adapting Matching Functions, PhD thesis: University of Michigan, (2000) p. 141.
  29. P. Pathak, M. Gordon, W. Fan, Effective information retrieval using genetic algorithms based matching function adaptation, Proceedings of the 33rd Hawaii International Conference on System Science (HICSS), Hawaii, USA, IEEE Computer Society Press, 2000.
  30. J. Pitkow, H. Schutze, T. Cass, R. Cooley, et al., Personalized search, Communications of the ACM 45 (2002) 50.
  31. S.E. Robertson, S. Walker, S. Jones, M.M. Hancock-Beaulieu, M. Gatford, Okapi at TREC-4, in: D.K. Harman (Ed.), Proceedings of the Fourth Text Retrieval Conference, NIST Special Publication, vol. 500-236, 1996, pp. 73 -97.
  32. G. Salton, Automatic text processing, Addison-Wesley Pub- lishing, Reading, MA, 1989.
  33. J. Savoy, M. Ndarugendamwo, D. Vrajitoru, Report on the TREC-4 experiment: combining probabilistic and vector space schemes, in: D.K. Harman (Ed.), Proceedings of the Fourth Text REtrieval Conference (TREC-4), NIST, 1996, pp. 537 -548.
  34. A. Singhal, G. Salton, M. Mitra, C. Buckley, Document length normalization, Information Processing & Management 32 (1996) 619 -633.
  35. C. Vogt, G. Cottrell, Fusion via a linear combination of scores, Information Retrieval 1 (1999) 151 -173.
  36. C. Vogt, G. Cottrell, R.K. Belew, B. Bartell, Using relevance to train a linear mixture of experts, in: E. Voorhees, D.K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5), NIST, vol. 500-238, 1997, pp. 503 -516.