Enriching Word Vectors with Subword Information
Abstract
Continuous word representations, trained on large unlabeled corpora are useful for many natural language processing tasks. Many popular models to learn such representations ignore the morphology of words, by assigning a distinct vector to each word. This is a limitation , especially for morphologically rich languages with large vocabularies and many rare words. In this paper, we propose a new approach based on the skip-gram model, where each word is represented as a bag of character n-grams. A vector representation is associated to each character n-gram, words being represented as the sum of these representations. Our method is fast, allowing to train models on large corpus quickly. We evaluate the obtained word representations on five different languages, on word similarity and analogy tasks.
References (17)
- References [Alexandrescu and Kirchhoff2006] Andrei Alexandrescu and Katrin Kirchhoff. 2006. Factored neural language models. In Proc. NAACL. [Ballesteros et al.2015] Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2015. Improved transition-based parsing by modeling characters instead of words with lstms. In Proc. EMNLP. [Baroni and Lenci2010] Marco Baroni and Alessandro Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computa- tional Linguistics, 36(4).
- Bojanowski et al.2015] Piotr Bojanowski, Armand Joulin, and Tomáš Mikolov. 2015. Alternative structures for character-level rnns. In ICLR Workshop. [Botha and Blunsom2014] Jan A Botha and Phil Blun- som. 2014. Compositional morphology for word rep- resentations and language modelling. In Proc. ICML. [Chen et al.2015] Xinxiong Chen, Lei Xu, Zhiyuan Liu, Maosong Sun, and Huanbo Luan. 2015. Joint learning of character and word embeddings. In Proc. IJCAI.
- Grzegorz Chrupała. 2014. Normalizing tweets with edit scripts and recurrent neural embed- dings. In Proc. ACL. [Collobert and Weston2008] Ronan Collobert and Jason Weston. 2008. A unified architecture for natural lan- guage processing: Deep neural networks with multi- task learning. In Proc. ICML. [Cotterell and Schütze2015] Ryan Cotterell and Hinrich Schütze. 2015. Morphological word-embeddings. In Proc. of NAACL. [Cui et al.2015] Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, Hanjun Dai, and Tie-Yan Liu. 2015. Knet: A general framework for learning word embedding using mor- phological knowledge. ACM Transactions on Infor- mation Systems. [Deerwester et al.1990] Scott Deerwester, Susan T Du- mais, George W Furnas, Thomas K Landauer, and Richard Harshman. 1990. Indexing by latent semantic analysis. Journal of the American society for informa- tion science, 41(6).
- dos Santos and Gatti2014] Cicero Nogueira dos Santos and Maira Gatti. 2014. Deep convolutional neural networks for sentiment analysis of short texts. In Proc. COLING. [dos Santos and Zadrozny2014] Cicero Nogueira dos Santos and Bianca Zadrozny. 2014. Learning character-level representations for part-of-speech tagging. In Proc. ICML. [Finkelstein et al.2001] Lev Finkelstein, Evgeniy
- Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, and Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proc. WWW.
- Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.
- Iryna Gurevych. 2005. Using the struc- ture of a conceptual network in computing semantic relatedness. In Proc. IJCNLP.
- Zellig S Harris. 1954. Distributional struc- ture. Word, 10(2-3).
- Hassan and Mihalcea2009] Samer Hassan and Rada Mi- halcea. 2009. Cross-lingual semantic relatedness us- ing encyclopedic knowledge. In Proc. EMNLP. [Joubarne and Inkpen2011] Colette Joubarne and Diana Inkpen. 2011. Comparison of semantic similarity for different languages using the google n-gram corpus and second-order co-occurrence measures. In Adv. in A.I. [Kim et al.2016] Yoon Kim, Yacine Jernite, David Son- tag, and Alexander M Rush. 2016. Character-aware neural language models. In Proc. AAAI. [Lazaridou et al.2013] Angeliki Lazaridou, Marco
- Marelli, Roberto Zamparelli, and Marco Baroni. 2013. Compositional-ly derived representations of morphologically complex words in distributional semantics. In Proc. ACL. [Ling et al.2015] Wang Ling, Chris Dyer, Alan W Black, Isabel Trancoso, Ramon Fermandez, Silvio Amir, Luis Marujo, and Tiago Luis. 2015. Finding function in form: Compositional character models for open vo- cabulary word representation. In Proc. EMNLP. [Lund and Burgess1996] Kevin Lund and Curt Burgess. 1996. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Meth- ods, Instruments, & Computers, 28(2). [Luong and Manning2016] Minh-Thang Luong and Christopher D. Manning. 2016. Achieving open vocabulary neural machine translation with hybrid word-character models. In Proc. ACL. [Luong et al.2013] Thang Luong, Richard Socher, and Christopher D Manning. 2013. Better word represen- tations with recursive neural networks for morphology. In Proc. CoNLL. [Mikolov et al.2012] Tomáš Mikolov, Ilya Sutskever, Anoop Deoras, Hai-Son Le, Stefan Kom- brink, and J Cernocky. 2012. Subword lan- guage modeling with neural networks. preprint (http://www.fit.vutbr.cz/imikolov/rnnlm/char.pdf). [Mikolov et al.2013a] Tomáš Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. Efficient estima- tion of word representations in vector space. arXiv preprint arXiv:1301.3781. [Mikolov et al.2013b] Tomáš Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013b. Distributed representations of words and phrases and their compositionality. In Adv. NIPS. [Qiu et al.2014] Siyu Qiu, Qing Cui, Jiang Bian, Bin Gao, and Tie-Yan Liu. 2014. Co-learning of word represen- tations and morpheme representations. In Proc. COL- ING. [Rumelhart et al.1988] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. 1988. Learning repre- sentations by back-propagating errors. Cognitive mod- eling, 5(3).
- Sak et al.2010] Haşim Sak, Murat Saraclar, and Tunga Gungör. 2010. Morphology-based and sub-word lan- guage modeling for turkish speech recognition. In Proc. ICASSP.
- Hinrich Schütze. 1992. Dimensions of meaning. In Proc. Supercomputing.
- Hinrich Schütze. 1993. Word space. In Adv. NIPS.
- Sennrich et al.2016] Rico Sennrich, Barry Haddow, and Alexandra Birch. 2016. Neural machine translation of rare words with subword units. In Proc. ACL. [Shaoul and Westbury2010] C. Shaoul and C. Westbury. 2010. The Westbury lab Wikipedia corpus. [Soricut and Och2015] Radu Soricut and Franz Och. 2015. Unsupervised morphology induction using word embeddings. In Proc. NAACL.
- Charles Spearman. 1904. The proof and measurement of association between two things. American Journal of Psychology, 15. [Sperr et al.2013] Henning Sperr, Jan Niehues, and Alexander Waibel. 2013. Letter n-gram-based in- put encoding for continuous space language models. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality. [Sutskever et al.2011] Ilya Sutskever, James Martens, and Geoffrey E Hinton. 2011. Generating text with recur- rent neural networks. In Proc. ICML. [Svoboda and Brychcin2016] L. Svoboda and T. Brychcin. 2016. New word analogy corpus for exploring embeddings of czech words. In Proc. CICLING. [Turney et al.2010] Peter D Turney, Patrick Pantel, et al. 2010. From frequency to meaning: Vector space mod- els of semantics. Journal of artificial intelligence re- search, 37(1).
- Zesch and Gurevych2006] Torsten Zesch and Iryna Gurevych. 2006. Automatically creating datasets for measures of semantic relatedness. In Proc. Workshop on Linguistic Distances.
- Zhang et al.2015] Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-level convolutional networks for text classification. In Adv. NIPS.