Arabic user search Query correction and expansion
2003, Proceedings of the 1st …
Abstract
This paper describes correction and expansion techniques of multilingual search queries submitted to the Arabiccentred search engine Barq . Key features of the correction technique are the use of 1. Arabic language morphology, 2. Arab speaker most common pronunciation and spelling mistakes in Arabic 3. Arab speaker most common spelling mistakes in transliterated English, and 4. Arabic language learners (as L2) spelling and pronunciation mistakes. The query expansion mechanisms also uses Arabic words roots and thesauri built automatically by a categorization engine, in order to achieve better recall. The preliminary results obtained for query correction show 92% of misspelled queries terms submitted are corrected. The preliminary results obtained for expansion show a promising 75% increase in recall, though qualitative evaluation for recall improvement is still underway.
References (19)
- T. Rachidi, et al., "Barq: distributed multilingual Internet search engine with focus on Arabic language," In proc of IEEE Conf. on Sys., Man and Cyber., Washington DC, Oct. 5-8, 2003.
- Al Bahar, http://www.albahhar.com/
- Ayna, http://www.ayna.com
- Google, http://www.google.com/
- Hahooa, http://www.hahooa.com/nav.php?ver=ar
- Konouz, http://www.konouz.com
- Al Idrissi, www.sakhr.com
- Barq, http://www.nafida.net
- A. Chekayri. La structure des racines en arabe. Ph.D. dissertation, University Paris VIII, 1999.
- M. El-Kourdi, T. Rachidi, A. Bensaid, and A. Chekayri, "A concatenative approach to Arabic root extraction", to appear, T.rachidi@alakhawayn.ma
- Modern Information retrieval, Ricardo Baeza-Yates, Addisson Wesley, 1999
- Hang Cui, Ji-Rong Wen, Jian-Yun Nie, Wei-Ying Ma. Probabilistic Query Expansion Using Query Logs, 2002
- M. El-Kourdi, A. Bensaid, and T. Rachidi, "Naïve bayes Automatic Categorization of Arabic Web documents", to appear, A.Bensaid@alakhawayn.ma
- J. L. Peterson. Computer programs for detecting and correcting spelling errors. Communications of the ACM, 23(12):676-687, December 1980
- Jan Daciuk, Incremental Construction of Finite-State Automata and Transducers, and their Use in the Natural Language Processing, Rozprawa doktorska ETI, WYDZIAŁ ELEKTRONIKI, TELEKOMUNIKACJI I INFORMATYKI G. Narutowicza 11/12 80-952 Gdańsk, Poland.
- F. J. Damerau. A technique for computer detection and correction of spelling errors. Communications of the ACM, 7(3):171-176, March 1964.
- Emmanuel Roche and Yves Schabes. Finite-State Language Processing. Bradford Book. MIT Press, Cambridge, Massachusetts, USA, 1997.
- Mehryar Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269- 311, June 1997.
- Donna Harman. Relevance, "Relevance feedback and other query modification techniques." In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval,: Data Structures & Algorithms, pp.241-263, Printice Hall, 1992.