Graph Convolutional Networks based Word Embeddings

Prateek Yadav

Outline

Graph Convolutional Networks based Word Embeddings

Prateek Yadav

2018, ArXiv

Abstract

Recently, word embeddings have been widely adopted across several NLP applications. However, most word embedding methods solely rely on linear context and do not provide a framework for incorporating word relationships like hypernym, nmod in a principled manner. In this paper, we propose WordGCN, a Graph Convolution based word representation learning approach which provides a framework for exploiting multiple types of word relationships. WordGCN operates at sentence as well as corpus level and allows to incorporate dependency parse based context in an efficient manner without increasing the vocabulary size. To the best of our knowledge, this is the first approach which effectively incorporates word relationships via Graph Convolutional Networks for learning word representations. Through extensive experiments on various intrinsic and extrinsic tasks, we demonstrate WordGCN's effectiveness over existing word embedding approaches. We make WordGCN's source code available to enco...

Figures (6)

Figure 1: Overview of WordGCN. WordGCN utilizes sentence level word co-occurrence relationships and directed semantic information for learning word embeddings through S-GCN component. These are further improved by RF-GCN, which incor- porates corpus level synonym relationships. Each word has two embeddings — neighborhood and target, for both components. W and W’ consist of target embeddings of each word in the vocabulary which is used to compute the final softmax scores. Please refer Section 4 for details.

Table 1: Evaluation on three intrinsic tasks: word similarity (spearman correlation), concept categorization (cluster purity), and word analogy (spearman correlation). WordGCN(-Sem) denotes WordGCN without using any information from semantic knowledge sources. Overall, WordGCN either outperforms or performs competitively compared to other existing approaches. Please refer to Section 7.1 for more details.

Table 2: Evaluation on extrinsic tasks: question classi- fication, news categorization, named entity recognition, parts-of-speech tagging, and neural co-reference resolution. WordGCN outperforms all existing approaches on four out of five tasks. Refer Section 7.2 for details. Figure 2: Average scores on similarity and categorization tasks. Similar trends are observed on analogy task. Dep stands for only dependency based context in WordGCN. Similarly Con and W stand for linear context and WordNet information respectively. Refer Section 7.3 for details.

Table 3: Comparison of average similarity scores of RF- GCN against other methods on incorporating semantic knowledge in pre-trained embeddings. Refer to Section 7.4 for details.

Table 4: Comparison of words close to the target word ac- cording to different word embedding methods ment whereas WordGCN is also able to capture the other sense of bank i.e. riverbank. For the word grave, we find WordGCN captures its sense as a part of graveyard (topi- cal similarity) and also captures its other meaning i.e. to be serious (functional similarity). This shows that WordGCN captures multiple, diverse connotations of a word as well as functional and topical similarities.

References (25)

Almuhareb, A. 2006. Attributes in lexical ac- quisition.
Alsuhaibani et al. 2018] Alsuhaibani, M.; Bollegala, D.; Maehara, T.; and Kawarabayashi, K.-i. 2018. Jointly learning word embed- dings using a corpus and a knowledge base. PLOS ONE 13(3):1- 26.
Baker, Fillmore, and Lowe 1998] Baker, C. F.; Fillmore, C. J.; and Lowe, J. B. 1998. The berkeley framenet project. In ACL, ACL '98, 86-90. Stroudsburg, PA, USA: ACL.
Baroni and Lenci 2010] Baroni, M., and Lenci, A. 2010. Distribu- tional memory: A general framework for corpus-based semantics. Comput. Linguist. 36(4):673-721.
Baroni and Lenci 2011] Baroni, M., and Lenci, A. 2011. How we blessed distributional semantic evaluation. In GEMS, GEMS '11, 1-10. Stroudsburg, PA, USA: ACL.
Bastings et al. 2017] Bastings, J.; Titov, I.; Aziz, W.; Marcheg- giani, D.; and Simaan, K. 2017. Graph convolutional encoders for syntax-aware neural machine translation. In EMNLP, 1957-1967. ACL. [Bengio et al. 2003] Bengio, Y.; Ducharme, R.; Vincent, P.; and Janvin, C. 2003. A neural probabilistic language model. JMLR 3:1137-1155.
Bengio, Courville, and Vincent 2013] Bengio, Y.; Courville, A.; and Vincent, P. 2013. Representation learning: A review and new perspectives. IEEE TPAMI 35(8):1798-1828.
Bronstein et al. 2017] Bronstein, M. M.; Bruna, J.; LeCun, Y.; Szlam, A.; and Vandergheynst, P. 2017. Geometric deep learning: Going beyond euclidean data. IEEE Signal Processing Magazine 34(4):18-42.
Bruna et al. 2013] Bruna, J.; Zaremba, W.; Szlam, A.; and LeCun, Y. 2013. Spectral networks and locally connected networks on graphs. CoRR abs/1312.6203.
Bruni, Tran, and Baroni 2014] Bruni, E.; Tran, N. K.; and Baroni, M. 2014. Multimodal distributional semantics. J. Artif. Int. Res. 49(1):1-47. [Collobert et al. 2011] Collobert, R.; Weston, J.; Bottou, L.; Karlen, M.; Kavukcuoglu, K.; and Kuksa, P. 2011. Natural language pro- cessing (almost) from scratch. JMLR 12:2493-2537.
Defferrard, Bresson, and Vandergheynst 2016] Defferrard, M.; Bresson, X.; and Vandergheynst, P. 2016. Convolutional neural networks on graphs with fast localized spectral filtering. CoRR abs/1606.09375.
Dernoncourt, Lee, and Szolovits 2017] Dernoncourt, F.; Lee, J. Y.; and Szolovits, P. 2017. NeuroNER: an easy-to-use program for named-entity recognition based on neural networks. EMNLP. [Faruqui et al. 2014] Faruqui, M.; Dodge, J.; Jauhar, S. K.; Dyer, C.; Hovy, E. H.; and Smith, N. A. 2014. Retrofitting word vectors to semantic lexicons. CoRR abs/1411.4166. [Finkelstein et al. 2001] Finkelstein, L.; Gabrilovich, E.; Matias, Y.; Rivlin, E.; Solan, Z.; Wolfman, G.; and Ruppin, E. 2001. Placing search in context: The concept revisited. In WWW 2001. [Glorot and Bengio 2010] Glorot, X., and Bengio, Y. 2010. Un- derstanding the difficulty of training deep feedforward neural net- works. In AISTATS. [Gutmann and Hyvärinen 2012] Gutmann, M. U., and Hyvärinen, A. 2012. Noise-contrastive estimation of unnormalized statisti- cal models, with applications to natural image statistics. J. Mach. Learn. Res. 13(1):307-361.
Harris 1954] Harris, Z. S. 1954. Distributional structure. WORD. [Jastrzebski, Lesniak, and Czarnecki ] Jastrzebski, S.; Lesniak, D.; and Czarnecki, W. M. How to evaluate word embeddings? on importance of data efficiency and simple supervised tasks. abs/1702.02170.
Jean et al. 2015] Jean, S.; Cho, K.; Memisevic, R.; and Bengio, Y. 2015. On using very large target vocabulary for neural machine translation. In IJCNLP, 1-10. ACL.
Ji et al. 2015] Ji, S.; Yun, H.; Yanardag, P.; Matsushima, S.; and Vishwanathan, S. V. N. 2015. Wordrank: Learning word embed- dings via robust ranking. CoRR abs/1506.02761. [Kingma and Ba 2014] Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. [Kipf and Welling 2016] Kipf, T. N., and Welling, M. 2016. Semi- supervised classification with graph convolutional networks. CoRR abs/1609.02907. [Komninos and Manandhar 2016] Komninos, A., and Manandhar, S. 2016. Dependency based embeddings for sentence classifica- tion tasks.
Lee, He, and Zettlemoyer ] Lee, K.; He, L.; and Zettlemoyer, L. Higher-order coreference resolution with coarse-to-fine inference. In NAACL 2018. [Levy and Goldberg 2014] Levy, O., and Goldberg, Y. 2014. Dependency-based word embeddings. In ACL, 302-308. ACL. [Li and Roth 2006] Li, X., and Roth, D. 2006. Learning question classifiers: The role of semantic information. NLE 12(3):229-249.
Li et al. 2018] Li, C.; Li, J.; Song, Y.; and Lin, Z. 2018. Training and evaluating improved dependency-based word embeddings. In AAAI. [Ma and Hovy 2016] Ma, X., and Hovy, E. H. 2016. End-to- end sequence labeling via bi-directional lstm-cnns-crf. CoRR abs/1603.01354. [Maas et al. 2011] Maas, A. L.; Daly, R. E.; Pham, P. T.; Huang, D.; Ng, A. Y.; and Potts, C. 2011. Learning word vectors for sentiment analysis. In ACL, HLT '11, 142-150. Stroudsburg, PA, USA: ACL. [Manning et al. 2014] Manning, C. D.; Surdeanu, M.; Bauer, J.; Finkel, J.; Bethard, S. J.; and McClosky, D. 2014. The Stanford CoreNLP natural language processing toolkit. In ACL, 55-60. [Marcheggiani and Titov 2017] Marcheggiani, D., and Titov, I. 2017. Encoding sentences with graph convolutional networks for semantic role labeling. In EMNLP, 1506-1515. ACL. [Mikolov et al. 2013] Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th In- ternational Conference on Neural Information Processing Systems -Volume 2, NIPS'13, 3111-3119. USA: Curran Associates Inc. [Miller 1995] Miller, G. A. 1995. Wordnet: A lexical database for english. Commun. ACM 38(11):39-41.
Mnih and Hinton ] Mnih, A., and Hinton, G. A scalable hierarchi- cal distributed language model. In NIPS, NIPS'08. [Morin and Bengio 2005] Morin, F., and Bengio, Y. 2005. Hierar- chical probabilistic neural network language model. In AISTATS. [Mrkšić et al. 2016] Mrkšić, N.; Ó Séaghdha, D.; Thomson, B.; Gašić, M.; Rojas-Barahona, L. M.; Su, P.-H.; Vandyke, D.; Wen, T.-H.; and Young, S. 2016. Counter-fitting word vectors to linguis- tic constraints. In ACL, 142-148. ACL. [Nguyen and Grishman 2018] Nguyen, T., and Grishman, R. 2018. Graph convolutional networks with argument-aware pooling for event detection.
Pavlick et al. 2015] Pavlick, E.; Rastogi, P.; Ganitkevitch, J.; Van Durme, B.; and Callison-Burch, C. 2015. Ppdb 2.0: Better paraphrase ranking, fine-grained entailment relations, word embed- dings, and style classification. In ACL, 425-430. ACL.
Pennington, Socher, and Manning 2014] Pennington, J.; Socher, R.; and Manning, C. D. 2014. Glove: Global vectors for word representation. In EMNLP, 1532-1543.
Pradhan et al. ] Pradhan, S.; Moschitti, A.; Xue, N.; Uryupina, O.; and Zhang, Y. Conll-2012 shared task: Modeling multilingual un- restricted coreference in ontonotes. In EMNLP-CONLL, CoNLL '12. [Radinsky et al. ] Radinsky, K.; Agichtein, E.; Gabrilovich, E.; and Markovitch, S. A word at a time: Computing word relatedness using temporal semantic analysis. In WWW 2011. [Reimers and Gurevych 2017] Reimers, N., and Gurevych, I. 2017. Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging. In EMNLP, 338- 348.
Rumelhart, Hinton, and Williams 1988] Rumelhart, D. E.; Hinton, G. E.; and Williams, R. J. 1988. Neurocomputing: Foundations of research. Cambridge, MA, USA: MIT Press. chapter Learning Representations by Back-propagating Errors, 696-699.
Scarselli et al. 2009] Scarselli, F.; Gori, M.; Tsoi, A. C.; Hagen- buchner, M.; and Monfardini, G. 2009. The graph neural network model. IEEE Transactions on Neural Networks 20(1):61-80.
Socher et al. 2013] Socher, R.; Bauer, J.; Manning, C. D.; and Ng, A. Y. 2013. Parsing With Compositional Vector Grammars. In ACL. [Song, Lee, and Xia 2017] Song, Y.; Lee, C.-J.; and Xia, F. 2017. Learning word representations with regularization from prior knowledge. In CONLL 2017.
Tjong Kim Sang and De Meulder ] Tjong Kim Sang, E. F., and De Meulder, F. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In HLT-NAACL, CONLL '03. [Vashishth et al. 2018] Vashishth, S.; Dasgupta, S. S.; Ray, S. N.; and Talukdar, P. 2018. Dating documents using graph convolution networks. In ACL, 1605-1615. ACL. [Yu and Dredze 2014] Yu, M., and Dredze, M. 2014. Improving lexical embeddings with semantic knowledge. In ACL, 545-550. ACL.

Graph Convolutional Networks based Word Embeddings

Sign up for access to the world's latest research

Abstract

Related papers

References (25)

Related papers

Related topics