Modeling score distributions in information retrieval

Stephen Robertson; Jaap Kamps; Avi Arampatzis

doi:10.1007/S10791-010-9145-5

Outline

Modeling score distributions in information retrieval

Stephen Robertson

Jaap Kamps

Avi Arampatzis

2011, Information Retrieval

https://doi.org/10.1007/S10791-010-9145-5

visibility

…

description

12 pages

link

1 file

Abstract

We review the history of modeling score distributions, focusing on the mixture of normal-exponential by investigating the theoretical as well as the empirical evidence supporting its use. We discuss previously suggested conditions which valid binary mixture models should satisfy, such as the Recall-Fallout Convexity Hypothesis, and formulate two new hypotheses considering the component distributions under some limiting conditions of parameter values. From all the mixtures suggested in the past, the current theoretical argument points to the two gamma as the most-likely universal model, with the normal-exponential being a usable approximation. Beyond the theoretical contribution, we provide new experimental evidence showing vector space or geometric models, and BM25, as being "friendly" to the normal-exponential, and that the non-convexity problem that the mixture possesses is practically not severe.

References (25)

Robertson, S.: On score distributions and relevance. In: Proceedings ECIR'07, Springer (2007) 40-51
Nottelmann, H., Fuhr, N.: From uncertain inference to probability of relevance for advanced IR applications. In: Proceedings ECIR'03. (2003) 235-250
Callan, J.: Distributed information retrieval. In: Advances Information Retrieval: Recent Research from the CIIR. Kluwer Academic Publishers (2000) 127-150
Lewis, D.D.: Evaluating and optimizing autonomous text classification systems. In: Pro- ceedings SIGIR'95, ACM Press (1995) 246-254
Oard, D.W., Hedin, B., Tomlinson, S., Baron, J.R.: Overview of the TREC 2008 legal track. In: Proceedings TREC 2008. (2009)
Lee, J.H.: Analyses of multiple evidence combination. In: Proceedings SIGIR'97, ACM Press (1997) 267-276
Manmatha, R., Rath, T.M., Feng, F.: Modeling score distributions for combining the outputs of search engines. In: Proceedings SIGIR'01, ACM Press (2001) 267-275
Fernández, M., Vallet, D., Castells, P.: Using historical data to enhance rank aggregation. In: Proceedings SIGIR'06, ACM Press (2006) 643-644
Arampatzis, A., Beney, J., Koster, C.H.A., van der Weide, T.P.: Incrementality, half-life, and threshold optimization for adaptive document filtering. In: Proceesing TREC 2000. (2000)
Zhang, Y., Callan, J.: Maximum likelihood estimation for filtering thresholds. In: Proceed- ings SIGIR'01, ACM Press (2001) 294-302
Collins-Thompson, K., Ogilvie, P., Zhang, Y., Callan, J.: Information filtering, novelty de- tection, and named-page finding. In: Proceedings TREC 2002. (2002)
Arampatzis, A., Robertson, S., Kamps, J.: Where to stop reading a ranked list? threshold optimization using truncated score distributions. In: Proceedings SIGIR'09, ACM Press (2009)
Swets, J.A.: Information retrieval systems. Science 141(3577) (1963) 245-250
Swets, J.A.: Effectiveness of information retrieval methods. American Documentation 20 (1969) 72-89
Bookstein, A.: When the most "pertinent" document should not be retrieved -an analysis of the Swets model. Information Processing and Management 13(6) (1977) 377-383
Baumgarten, C.: A probabilitstic solution to the selection and fusion problem in distributed information retrieval. In: Proceedings SIGIR'99, ACM Press (1999) 246-253
Arampatzis, A., van Hameren, A.: The score-distributional threshold optimization for adap- tive binary classification tasks. In: Proceedings SIGIR'01, ACM Press (2001) 285-293
Fernández, M., Vallet, D., Castells, P.: Probabilistic score normalization for rank aggregation. In: Proceedings ECIR'06, Springer (2006) 553-556
van Rijsbergen, C.J.: Information Retrieval. Butterworth (1979)
Cooper, W.S.: Some inconsistencies and misnomers in probabilistic information retrieval. In: Proceedings SIGIR'91, ACM Press (1991) 57-61
Cooper, W.S., Gey, F.C., Dabney, D.P.: Probabilistic retrieval based on staged logistic re- gression. In: Proceedings SIGIR'92, ACM Press (1992) 198-210
Arampatzis, A.: Unbiased s-d threshold optimization, initial query degradation, decay, and incrementality, for adaptive document filtering. In: Proceedings TREC 2001. (2002)
Robertson, S.E.: The parametric description of retrieval tests. part 1: The basic parameters. Journal of Documentation 25(1) (1969) 1-27
Robertson, S.E., Bovey, J.D.: Statistical problems in the application of probabilistic models to information retrieval. Technical Report Report No. 5739, BLR&DD (1982)
Arampatzis, A., Kamps, J.: Where to stop reading a ranked list? In: Proceedings TREC 2008. (2008)

Modeling score distributions in information retrieval

Sign up for access to the world's latest research

Abstract

Related papers

References (25)

Related papers

Related topics