Academia.eduAcademia.edu

Outline

GloVe: Global Vectors for Word Representation

Abstract

Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic , but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global log-bilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful sub-structure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

References (29)

  1. Tom M. Apostol. 1976. Introduction to Analytic Number Theory. Introduction to Analytic Num- ber Theory.
  2. Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In ACL.
  3. Yoshua Bengio. 2009. Learning deep architectures for AI. Foundations and Trends in Machine Learning.
  4. Yoshua Bengio, Réjean Ducharme, Pascal Vin- cent, and Christian Janvin. 2003. A neural prob- abilistic language model. JMLR, 3:1137-1155.
  5. John A. Bullinaria and Joseph P. Levy. 2007. Ex- tracting semantic representations from word co- occurrence statistics: A computational study. Behavior Research Methods, 39(3):510-526.
  6. Dan C. Ciresan, Alessandro Giusti, Luca M. Gam- bardella, and Jürgen Schmidhuber. 2012. Deep neural networks segment neuronal membranes in electron microscopy images. In NIPS, pages 2852-2860.
  7. Ronan Collobert and Jason Weston. 2008. A uni- fied architecture for natural language process- ing: deep neural networks with multitask learn- ing. In Proceedings of ICML, pages 160-167.
  8. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural Language Processing (Al- most) from Scratch. JMLR, 12:2493-2537.
  9. Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41.
  10. John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learn- ing and stochastic optimization. JMLR, 12.
  11. Lev Finkelstein, Evgenly Gabrilovich, Yossi Ma- tias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in con- text: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, pages 406-414. ACM.
  12. Eric H. Huang, Richard Socher, Christopher D. Manning, and Andrew Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL.
  13. Rémi Lebret and Ronan Collobert. 2014. Word embeddings through Hellinger PCA. In EACL.
  14. Omer Levy, Yoav Goldberg, and Israel Ramat- Gan. 2014. Linguistic regularities in sparse and explicit word representations. CoNLL-2014.
  15. Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, In- strumentation, and Computers, 28:203-208.
  16. Minh-Thang Luong, Richard Socher, and Christo- pher D Manning. 2013. Better word represen- tations with recursive neural networks for mor- phology. CoNLL-2013.
  17. Tomas Mikolov, Kai Chen, Greg Corrado, and Jef- frey Dean. 2013a. Efficient Estimation of Word Representations in Vector Space. In ICLR Work- shop Papers.
  18. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119.
  19. Tomas Mikolov, Wen tau Yih, and Geoffrey Zweig. 2013c. Linguistic regularities in con- tinuous space word representations. In HLT- NAACL.
  20. George A. Miller and Walter G. Charles. 1991. Contextual correlates of semantic similarity. Language and cognitive processes, 6(1):1-28.
  21. Andriy Mnih and Koray Kavukcuoglu. 2013. Learning word embeddings efficiently with noise-contrastive estimation. In NIPS.
  22. Douglas L. T. Rohde, Laura M. Gonnerman, and David C. Plaut. 2006. An improved model of semantic similarity based on lexical co-occurence. Communications of the ACM, 8:627-633.
  23. Herbert Rubenstein and John B. Goodenough. 1965. Contextual correlates of synonymy. Com- munications of the ACM, 8(10):627-633.
  24. Fabrizio Sebastiani. 2002. Machine learning in au- tomated text categorization. ACM Computing Surveys, 34:1-47.
  25. Richard Socher, John Bauer, Christopher D. Man- ning, and Andrew Y. Ng. 2013. Parsing With Compositional Vector Grammars. In ACL.
  26. Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. 2003. Quanti- tative evaluation of passage retrieval algorithms for question answering. In Proceedings of the SIGIR Conference on Research and Develop- ment in Informaion Retrieval.
  27. Erik F. Tjong Kim Sang and Fien De Meul- der. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named en- tity recognition. In CoNLL-2003.
  28. Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and gen- eral method for semi-supervised learning. In Proceedings of ACL, pages 384-394.
  29. Mengqiu Wang and Christopher D. Manning. 2013. Effect of non-linear deep architecture in sequence labeling. In Proceedings of the 6th International Joint Conference on Natural Lan- guage Processing (IJCNLP).