Academia.eduAcademia.edu

Outline

Hash2Vec: Feature Hashing for Word Embeddings

Abstract

In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.

References (16)

  1. Hill, F., Cho, K., Korhonen, A., Bengio, Y.: Learning to understand phrases by embedding the dictionary. arXiv preprint (to be published)
  2. Mikolov, T., Yih, W. T., Zweig, G.: Linguistic Regularities in Continuous Space Word Representations. In HLT-NAACL, 746-751 (2013)
  3. Pennington, J., Socher, R., Manning, C. D. : Glove: Global Vectors for Word Rep- resentation. In EMNLP,Vol. 14, 1532-1543 (2014)
  4. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., Dean, J. : Distributed repre- sentations of words and phrases and their compositionality In Advances in neural information processing systems,3111-3119 (2013)
  5. Shi, Q., Petterson, J., Dror, G., Langford, J., Strehl, A. L., Smola, A. J., Vish- wanathan, S. V. N.: Hash kernels. In International Conference on Artificial Intelli- gence and Statistics, 496-503 (2009)
  6. Attenberg, J., Weinberger, K., Dasgupta, A., Smola, A., Zinkevich, M.: Collabo- rative Email-Spam Filtering with the Hashing Trick. In Proceedings of the Sixth Conference on Email and Anti-Spam (2009)
  7. Weinberger, K., Dasgupta, A., Langford, J., Smola, A., Attenberg, J.: Feature hash- ing for large scale multitask learning. In Proceedings of the 26th Annual Interna- tional Conference on Machine Learning, 1113-1120 (2009)
  8. Johnson, W. B., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. Contemporary mathematics, 26,1, 189-206 (1984)
  9. Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences, 66, 4, 671-687 (2003)
  10. Huang, E. H., Socher, R., Manning, C. D., Ng, A. Y.: Improving word representa- tions via global context and multiple word prototypes. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers- Volume 1, 873-882 (2012).
  11. Li, Y., Xu, L., Tian, F., Jiang, L., Zhong, X., Chen, E.: Word embedding revis- ited: A new representation learning and explicit matrix factorization perspective. In Proceedings of the 24th International Joint Conference on Artificial Intelligence, 3650-3656 (2015)
  12. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web, 406-414 (2001)
  13. Radinsky, K., Agichtein, E., Gabrilovich, E., Markovitch, S.: A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web, 337-346 (2011)
  14. Van der Maaten, L., Hinton, G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2579-2605 (2008)
  15. Chris, D. P.: Another stemmer. In ACM SIGIR Forum, 24, 3, 56-61 (1990)
  16. Levy, O., Goldberg, Y., Ramat-Gan, I.: Linguistic Regularities in Sparse and Ex- plicit Word Representations. In CoNLL, 171-180 (2014)