Task-dependent Optimal Weight Combinations for Static Embeddings
Northern European Journal of Language Technology
https://doi.org/10.3384/NEJLT.2000-1533.2022.4438Abstract
A variety of NLP applications use word2vec skip-gram, GloVe, and fastText word embeddings. These models learn two sets of embedding vectors, but most practitioners use only one of them, or alternately an unweighted sum of both. This is the first study to systematically explore a range of linear combinations between the first and second embedding sets. We evaluate these combinations on a set of six NLP benchmarks including IR, POS-tagging, and sentence similarity. We show that the default embedding combinations are often suboptimal and demonstrate 1.0-8.0% improvements. Notably, GloVe’s default unweighted sum is its least effective combination across tasks. We provide a theoretical basis for weighting one set of embeddings more than the other according to the algorithm and task. We apply our findings to improve accuracy in applications of cross-lingual alignment and navigational knowledge by up to 15.2%.
References (53)
- Akhtar, Syed Sarfaraz, Arihant Gupta, Avijit Vajpayee, Arjit Srivastava, and Manish Shrivastava. 2017. Word similarity datasets for Indian languages: Annotation and baseline systems. In Proceedings of the 11th Lin- guistic Annotation Workshop, pages 91-94, Valencia, Spain. Association for Computational Linguistics.
- Andrus, Berkeley and Nancy Fulda. 2020. Immer- sive gameplay via improved natural language under- standing. In Foundations of Digital Games 2020.
- Attardi, Giusepppe. 2015. Wikiextractor. https://github. com/attardi/wikiextractor.
- Bandy, John and Nicholas Vincent. 2021. Addressing "documentation debt" in machine learning: A retro- spective datasheet for bookcorpus. In Proceedings Northern European Journal of Language Technology of the Neural Information Processing Systems Track on Datasets and Benchmarks, volume 1.
- Bender, Emily M., Timnit Gebru, Angelina McMillan- Major, and Shmargaret Shmitchell. 2021. On the dan- gers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT '21, page 610-623, New York, NY, USA. Association for Computing Machinery.
- Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics, 5:135-146.
- Brown, Zachary, Nathaniel Robinson, David Wingate, and Nancy Fulda. 2020. Towards neural program- ming interfaces. Advances in Neural Information Pro- cessing Systems, 33:17416-17428.
- Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understand- ing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171-4186, Minneapolis, Minnesota. Association for Computa- tional Linguistics.
- Dufter, Philipp, Nora Kassner, and Hinrich Schütze. 2021. Static embeddings as efficient knowledge bases? In Proceedings of the 2021 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, pages 2353-2363.
- Fulda, Nancy, Daniel Ricks, Ben Murdoch, and David Wingate. 2017a. What can you do with a rock? affor- dance extraction via word embeddings. In Proceed- ings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pages 1039-1045.
- Fulda, Nancy and Nathaniel Robinson. 2021. Improved word representations via summed target and context embeddings. In 2020 IEEE 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI).
- Fulda, Nancy, Nathan Tibbetts, Zachary Brown, and David Wingate. 2017b. Harvesting common-sense navigational knowledge for robotics from uncurated text corpora. In Proceedings of the First Conference on Robot Learning.
- Gupta, Piyush, Inika Roy, Gunnika Batra, and Arun Ku- mar Dubey. 2021. Decoding emotions in text using glove embeddings. In 2021 International Conference on Computing, Communication, and Intelligent Sys- tems (ICCCIS), pages 36-40.
- Hamilton, William L., Jure Leskovec, and Dan Jurafsky. 2016. Diachronic word embeddings reveal statistical laws of semantic change. CoRR, abs/1605.09096.
- Jain, Vaibhav. 2020. GloVeInit at SemEval-2020 task 1: Using GloVe vector initialization for unsuper- vised lexical semantic change detection. In Proceed- ings of the Fourteenth Workshop on Semantic Evalua- tion, pages 208-213, Barcelona (online). International Committee for Computational Linguistics.
- Jansen, Stefan. 2017. Word and phrase translation with word2vec. CoRR, abs/1705.03127.
- Khalid, Usama, Aizaz Hussain, Muhammad Umair Ar- shad, Waseem Shahzad, and Mirza Omer Beg. 2021. Co-occurrences using fasttext embeddings for word similarity tasks in urdu. CoRR, abs/2102.10957.
- Khatri, Akshay and Pranav P. 2020. Sarcasm detection in tweets with BERT and GloVe embeddings. In Pro- ceedings of the Second Workshop on Figurative Lan- guage Processing, pages 56-60, Online. Association for Computational Linguistics.
- Khatua, Aparup, Apalak Khatua, and Erik Cambria. 2019. A tale of two epidemics: Contextual word2vec for classifying twitter streams during outbreaks. In- formation Processing & Management, 56(1):247-257.
- Kingma, Diederik P and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR (Poster).
- Kocmi, Tom and Ondřej Bojar. 2017. An exploration of word embedding initialization in deep-learning tasks. In Proceedings of the 14th International Conference on Natural Language Processing (ICON-2017), pages 56- 64, Kolkata, India. NLP Association of India.
- Lahiri, Shibamouli. 2014. Complexity of Word Colloca- tion Networks: A Preliminary Structural Analysis. In Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Asso- ciation for Computational Linguistics, pages 96-105, Gothenburg, Sweden. Association for Computational Linguistics.
- Lample, Guillaume, Alexis Conneau, Marc'Aurelio Ran- zato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In International Conference on Learning Representations.
- Lewis, Mike, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and com- prehension. In Proceedings of the 58th Annual Meet- ing of the Association for Computational Linguistics, Northern European Journal of Language Technology pages 7871-7880, Online. Association for Computa- tional Linguistics.
- Lin, Zhou, Qifeng Zhou, and Langcai Cao. 2021. Two- stage encoder for pointer-generator network with pretrained embeddings. In 2021 16th International Conference on Computer Science & Education (ICCSE), pages 524-529. IEEE.
- Liu, Nelson F., Matt Gardner, Yonatan Belinkov, Matthew E. Peters, and Noah A. Smith. 2019a. Lin- guistic knowledge and transferability of contextual representations. In Proceedings of the 2019 Confer- ence of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1073-1094, Minneapolis, Minnesota. Association for Computational Linguistics.
- Liu, Yinhan, Myle Ott, Naman Goyal, Jingfei Du, Man- dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019b. Roberta: A robustly optimized BERT pretraining approach. CoRR, abs/1907.11692.
- Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estimation of word represen- tations in vector space. arXiv preprint arXiv:1301.3781.
- Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Cor- rado, and Jeff Dean. 2013b. Distributed representa- tions of words and phrases and their composition- ality. In C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, editors, NIPS, pages 3111-3119. Curran Associates, Inc.
- Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013c. Linguistic regularities in continuous space word representations. In Proceedings of the 2013 Con- ference of the North American Chapter of the Associa- tion for Computational Linguistics: Human Language Technologies, pages 746-751, Atlanta, Georgia. Asso- ciation for Computational Linguistics.
- Minaee, Shervin, Nal Kalchbrenner, Erik Cambria, Nar- jes Nikzad, Meysam Chenaghlu, and Jianfeng Gao. 2021. Deep learning-based text classification: A com- prehensive review. ACM Comput. Surv., 54(3).
- Mitra, Bhaskar, Eric Nalisnick, Nick Craswell, and Rich Caruana. 2016. A dual embedding space model for document ranking. This paper is an extended evalu- ation and analysis of the model proposed in a poster to appear in WWW'16, April 11 -15, 2016, Montreal, Canada.
- Nalisnick, Eric, Mitra Bhaskar, Nick Craswell, and Rich Caruana. 2016. Improving document ranking with dual word embeddings. In WWW '16 Companion: Proceedings of the 25th International Conference Com- panion on World Wide Web, pages 83-84.
- Ni, Chien-Chun. 2015. Multiple choice question (MCQ) dataset. https://www3.cs.stonybrook.edu/ chni/post/mcq- dataset/.
- Pennington, Jeffrey, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word rep- resentation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Process- ing, EMNLP 2014, October 25-29, 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pages 1532-1543.
- Peters, Matthew, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word rep- resentations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, Volume 1 (Long Papers), pages 2227-2237, New Orleans, Louisiana. Association for Computa- tional Linguistics.
- Peterson, Joshua C. 2019. OpenWebText. https://github.com/jcpeterson/openwebtext.
- Pickard, Thomas. 2020. Comparing word2vec and GloVe for automatic measurement of MWE composi- tionality. In Proceedings of the Joint Workshop on Mul- tiword Expressions and Electronic Lexicons, pages 95- 100, online. Association for Computational Linguis- tics.
- Pourdamghani, Nima, Marjan Ghazvininejad, and Kevin Knight. 2018. Using word vectors to improve word alignments for low resource machine transla- tion. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Com- putational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pages 524-528, New Or- leans, Louisiana. Association for Computational Lin- guistics.
- Premjith, B. 2019. Part of speech tagging ma- chine learning deep learning word2vec fastText. https://github.com/premjithb/Part-of-Speech- Tagging-Machine-Learning-Deep-Learning- Word2vec-fasttext.
- Qi, Ye, Devendra Sachan, Matthieu Felix, Sarguna Pad- manabhan, and Graham Neubig. 2018. When and why are pre-trained word embeddings useful for neu- ral machine translation? In Proceedings of the 2018 Conference of the North American Chapter of the As- sociation for Computational Linguistics: Human Lan- guage Technologies, Volume 2 (Short Papers), pages
- Northern European Journal of Language Technology 529-535, New Orleans, Louisiana. Association for Computational Linguistics.
- Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. 2019. Language models are unsupervised multitask learners. OpenAI Blog, 1(8):9.
- Robinson, Nathaniel, Zachary Brown, Timothy Sitze, and Nancy Fulda. 2021. Text classifications learned from language model hidden layers. In 2021 IEEE 19th World Symposium on Applied Machine Intelli- gence and Informatics (SAMI), pages 000207-000210. IEEE.
- Sabet, Masoud Jalili, Philipp Dufter, and Hinrich Schütze. 2020. Simalign: High quality word align- ments without parallel training data using static and contextualized embeddings. CoRR, abs/2004.08728.
- Speer, Robyn, Joshua Chin, and Catherine Havasi. 2017. Conceptnet 5.5: An open multilingual graph of gen- eral knowledge. In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI'17, page 4444-4451. AAAI Press.
- Strubell, Emma, Ananya Ganesh, and Andrew McCal- lum. 2019. Energy and policy considerations for deep learning in NLP. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguis- tics, pages 3645-3650, Florence, Italy. Association for Computational Linguistics.
- Turney, Peter D. 2006. Similarity of semantic relations. Computational Linguistics, 32(3):379-416.
- Wilson, Theresa, Zornitsa Kozareva, Preslav Nakov, Alan Ritter, Sara Rosenthal, and Veselin Stoy- anov. 2013. Sentiment analysis in twitter. http://www.cs.york.ac.uk/semeval-2013/task2/.
- Yuan, Weizhe, Graham Neubig, and Pengfei Liu. 2021. BARTScore: Evaluating generated text as text gener- ation. Advances in Neural Information Processing Sys- tems, 34:27263-27277.
- Zhang, Tianyi, Varsha Kishore, Felix Wu, Kilian Q Wein- berger, and Yoav Artzi. 2019. BERTScore: Evaluating text generation with BERT. In International Confer- ence on Learning Representations.
- Zhu, Yudong, Di Zhou, Jinghui Xiao, Xin Jiang, Xiao Chen, and Qun Liu. 2020. HyperText: Endowing Fast- Text with hyperbolic geometry. In Findings of the As- sociation for Computational Linguistics: EMNLP 2020, pages 1166-1171, Online. Association for Computa- tional Linguistics.
- Zhu, Yukun, Ryan Kiros, Richard S. Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. CoRR, abs/1506.06724.