Using Word Embeddings for Information Retrieval
Proceedings of the 27th ACM International Conference on Information and Knowledge Management
https://doi.org/10.1145/3269206.3269277Abstract
Neural word embedding approaches, due to their ability to capture semantic meanings of vocabulary terms, have recently gained attention of the information retrieval (IR) community and have shown promising results in improving ad hoc retrieval performance. It has been observed that these approaches are sensitive to various choices made during the learning of word embeddings and their usage, often leading to poor reproducibility. We study the effect of varying following two parameters, viz., i) the term normalization and ii) the choice of training collection, on ad hoc retrieval performance with word2vec and fastText embeddings. We present quantitative estimates of similarity of word vectors obtained under different settings, and use embeddings based query expansion task to understand the effects of these parameters on IR effectiveness.
References (15)
- Qingyao Ai, Liu Yang, Jiafeng Guo, and W. Bruce Croft. 2016. Analysis of the Paragraph Vector Model for Information Retrieval. In Proc. of ICTIR'16. 133-142.
- Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching Word Vectors with Subword Information. TACL 5 (2017), 135-146.
- Fernando Diaz, Bhaskar Mitra, and Nick Craswell. 2016. Query Expansion with Locally-Trained Word Embeddings. In Proc. of ACL'16.
- Fernando Diaz, Bhaskar Mitra, and Nick Craswell. 2016. Query Expansion with Locally-Trained Word Embeddings. In Proc. of ACL'16. 367-377.
- Debasis Ganguly, Dwaipayan Roy, Mandar Mitra, and Gareth J. F. Jones. 2015. Word Embedding based Generalized Language Model for Information Retrieval. In Proc. of SIGIR'15. 795-798.
- Mihajlo Grbovic, Nemanja Djuric, Vladan Radosavljevic, Fabrizio Silvestri, and Narayan Bhamidipati. 2015. Context-and Content-aware Embeddings for Query Rewriting in Sponsored Search. In Proc. of SIGIR '15. 383-392.
- Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A Deep Relevance Matching Model for Ad-hoc Retrieval. In Proc. of CIKM'16. 55-64.
- Donna K. Harman (Ed.). 1992. Overview of TREC-1. NIST.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Distributed Representations of Words and Phrases and their Composition- ality. In Proc. NIPS '13. 3111-3119.
- Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to Match using Local and Distributed Representations of Text for Web Search. In Proc. of WWW'17. 1291-1299.
- Dwaipayan Roy, Debasis Ganguly, Mandar Mitra, and Gareth J. F. Jones. 2016. Word Vector Compositionality based Relevance Feedback using Kernel Density Estimation. In Proc. of CIKM'16. 1281-1290.
- Dwaipayan Roy, Debjyoti Paul, Mandar Mitra, and Utpal Garain. 2016. Using Word Embeddings for Automatic Query Expansion. In Proc. of NeuIR-2016 Work- shop, collocated with SIGIR.
- Hamed Zamani and W. Bruce Croft. 2016. Embedding-based Query Language Models. In Proc. of ICTIR'16. 147-156.
- Guoqing Zheng and Jamie Callan. 2015. Learning to Reweight Terms with Distributed Representations. In Proc. of SIGIR '15. 575-584.
- Guido Zuccon, Bevan Koopman, Peter Bruza, and Leif Azzopardi. 2015. Integrat- ing and Evaluating Neural Word Embeddings in Information Retrieval. In Proc. of ADCS '15. 12:1-12:8.