Faster Adaptive Set Intersections for Text Searching
2006, Lecture Notes in Computer Science
https://doi.org/10.1007/11764298_13Abstract
The intersection of large ordered sets is a common problem in the context of the evaluation of boolean queries to a search engine. In this paper we engineer a better algorithm for this task, which improves over those proposed by Demaine, Munro and López-Ortiz [SODA 2000/ALENEX 2001], by using a variant of interpolation search. More specifically, our contributions are threefold. First, we corroborate and complete the practical study from Demaine et al. on comparison based intersection algorithms. Second, we show that in practice replacing binary search and galloping (one-sided binary) search [4] by interpolation search improves the performance of each main intersection algorithms. Third, we introduce and test variants of interpolation search: this results in an even better intersection algorithm. Topics. Evaluation of Algorithms for Realistic Environments, Implementation, Testing, Evaluation and Fine-tuning of Algorithms, Information Retrieval.
References (16)
- Ricardo A. Baeza-Yates. A Fast Set Intersection Algorithm for Sorted Sequences. In Proceedings of 15th Annual Symposium on Combinatorial Pattern Matching (CPM), 400-408, 2004.
- Ricardo A. Baeza-Yates, Alejandro Salinger. Experimental Analysis of a Fast In- tersection Algorithm for Sorted Sequences. In Proceedings of 12th International Conference on String Processing and Information Retrieval (SPIRE), 13-24, 2005.
- Jérémy Barbay and Claire Kenyon. Adaptive Intersection and t-Threshold Prob- lems. In Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Al- gorithms (SODA), 390-399, 2002.
- Jon Louis Bentley and Andrew Chi-Chih Yao. An almost optimal algorithm for unbounded searching. Information Processing Letters, 5(3):82-87, 1976.
- Daniel K. Blandford and Guy E. Blelloch. Compact Representations of Ordered Sets. ACM/SIAM Symposium on Discrete Algorithms (SODA), 11-19, 2004.
- Erik D. Demaine, Thouis R. Jones, Mihai Patrascu. Interpolation search for non- independent data. In Proceedings of the 15th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 529-530, 2004.
- Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Adaptive set inter- sections, unions, and differences. In Proceedings of the 11th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), 743-752, 2000.
- Erik D. Demaine, Alejandro López-Ortiz, and J. Ian Munro. Experiments on Adap- tive set intersections for text retrieval systems. In Proceedings of the 3rd Workshop on Algorithm Engineering and Experiments (ALENEX), 91-104, 2001.
- V. Estivill-Castro and Derick Wood. A survey of adaptive sorting algorithms. ACM Computing Surveys, 24(4) 441-476, 1992.
- W. Frakes and R. Baeza-Yates. Information Retrieval. Prentice Hall, 1992.
- G. Gonnet, L. Rogers, and G. George. An algorithmic and complexity analysis of interpolation search. Acta Informatica, 13(1) 39-52, 1980.
- Frank K. Hwang, Shen Lin. Optimal Merging of 2 Elements with n Elements. Acta Informatica, v.1, 145-158, 1971.
- Frank K. Hwang, Shen Lin. A Simple Algorithm for Merging Two Disjoint Linearly- Ordered Sets. SIAM Journal of Computing, v.1, 31-39, 1972.
- Frank K. Hwang. Optimal Merging of 3 Elements with n Elements. SIAM Journal of Computing, v.9, 298-320, 1980.
- U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. In Proceedings of the 1st Symposium on Discrete Algorithms (SODA), 319-327, 1990.
- Y. Perl, A. Itai, and H. Avni. Interpolation search-A log log n search. CACM, 21(7) 550-554, 1978.