GloVe: Global Vectors for Word Representation
Abstract
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic , but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful sub-structure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.
References (29)
- Tom M. Apostol. 1976. Introduction to Analytic Number Theory. Introduction to Analytic Num- ber Theory.
- Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL.
- Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning.
- Yoshua Bengio, Réjean Ducharme, Pascal Vin- cent, and Christian Janvin. 2003. A neural prob- abilistic language model. JMLR, 3:1137-1155.
- John A. Bullinaria and Joseph P. Levy. 2007. Ex- tracting semantic representations from word co- occurrence statistics: A computational study. Behavior Research Methods, 39(3):510-526.
- Dan C. Ciresan, Alessandro Giusti, Luca M. Gam- bardella, and Jürgen Schmidhuber. 2012. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2852-2860.
- Ronan Collobert and Jason Weston. 2008. A uni- fied architecture for natural language process- ing: deep neural networks with multitask learn- ing. In Proceedings of ICML, pages 160-167.
- Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Al- most) from Scratch. JMLR, 12:2493-2537.
- Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41.
- John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learn- ing and stochastic optimization. JMLR, 12.
- Lev Finkelstein, Evgenly Gabrilovich, Yossi Ma- tias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in con- text: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406-414. ACM.
- Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL.
- Rémi Lebret and Ronan Collobert. 2014. Word embeddings through Hellinger PCA. In EACL.
- Omer Levy, Yoav Goldberg, and Israel Ramat- Gan. 2014. Linguistic regularities in sparse and explicit word representations. CoNLL-2014.
- Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, In- strumentation, and Computers, 28:203-208.
- Minh-Thang Luong, Richard Socher, and Christo- pher D Manning. 2013. Better word represen- tations with recursive neural networks for mor- phology. CoNLL-2013.
- Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- frey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. In ICLR Work- shop Papers.
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119.
- Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013c. Linguistic regularities in con- tinuous space word representations. In HLT- NAACL.
- George A. Miller and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1-28.
- Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS.
- Douglas L. T. Rohde, Laura M. Gonnerman, and David C. Plaut. 2006. An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM, 8:627-633.
- Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Com- munications of the ACM, 8(10):627-633.
- Fabrizio Sebastiani. 2002. Machine learning in au- tomated text categorization. ACM Computing Surveys, 34:1-47.
- Richard Socher, John Bauer, Christopher D. Man- ning, and Andrew Y. Ng. 2013. Parsing With Compositional Vector Grammars. In ACL.
- Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quanti- tative evaluation of passage retrieval algorithms for question answering. In Proceedings of the SIGIR Conference on Research and Develop- ment in Informaion Retrieval.
- Erik F. Tjong Kim Sang and Fien De Meul- der. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named en- tity recognition. In CoNLL-2003.
- Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and gen- eral method for semi-supervised learning. In Proceedings of ACL, pages 384-394.
- Mengqiu Wang and Christopher D. Manning. 2013. Effect of non-linear deep architecture in sequence labeling. In Proceedings of the 6th International Joint Conference on Natural Lan- guage Processing (IJCNLP).