CogniVal: A Framework for Cognitive Word Embedding Evaluation
2019, Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)
https://doi.org/10.18653/V1/K19-1050Abstract
An interesting method of evaluating word representations is by how much they reflect the semantic representations in the human brain. However, most, if not all, previous works only focus on small datasets and a single modality. In this paper, we present the first multimodal framework for evaluating English word representations based on cognitive lexical semantics. Six types of word embeddings are evaluated by fitting them to 15 datasets of eyetracking, EEG and fMRI signals recorded during language processing. To achieve a global score over all evaluation hypotheses, we apply statistical significance testing accounting for the multiple comparisons problem. This framework is easily extensible and available to include other intrinsic and extrinsic evaluation methods. We find strong correlations in the results between cognitive datasets, across recording modalities and to their performance on extrinsic NLP tasks.
References (53)
- Samira Abnar, Rasyan Ahmed, Max Mijnheer, and Willem Zuidema. 2018. Experiential, distributional and dependency-based word embeddings have com- plementary roles in decoding brain activity. In Pro- ceedings of the 8th Workshop on Cognitive Modeling and Computational Linguistics (CMCL 2018), pages 57-66.
- Amir Bakarov. 2018a. Can eye movement data be used as ground truth for word embeddings evaluation? arXiv preprint arXiv:1804.08749.
- Amir Bakarov. 2018b. A survey of word em- beddings evaluation methods. arXiv preprint arXiv:1801.09536.
- Lisa Beinborn, Samira Abnar, and Rochelle Choenni. 2019. Robust evaluation of language-brain encoding experiments. arXiv preprint arXiv:1904.02547.
- Jonathan R Brennan, Edward P Stabler, Sarah E Van Wagenen, Wen-Ming Luh, and John T Hale. 2016. Abstract linguistic structure correlates with temporal activity during naturalistic comprehension. Brain and Language, 157:81-94.
- Michael P Broderick, Andrew J Anderson, Giovanni M Di Liberto, Michael J Crosse, and Edmund C Lalor. 2018. Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech. Current Biology, 28(5):803-809.
- Marco Catani, Derek K Jones, and Dominic H Ffytche. 2005. Perisylvian language networks of the human brain. Annals of Neurology: Official Journal of the American Neurological Association and the Child Neurology Society, 57(1):8-16.
- Billy Chiu, Anna Korhonen, and Sampo Pyysalo. 2016. Intrinsic evaluation of word vectors fails to predict extrinsic performance. In Proceedings of the 1st Workshop on Evaluating Vector-Space Representa- tions for NLP, pages 1-6.
- Uschi Cop, Nicolas Dirix, Denis Drieghe, and Wouter Duyck. 2017. Presenting GECO: An eyetracking corpus of monolingual and bilingual sentence read- ing. Behavior research methods, 49(2):602-615.
- Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- standing. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Tech- nologies, Volume 1 (Long and Short Papers), pages 4171-4186.
- Rotem Dror, Gili Baumer, Marina Bogomolov, and Roi Reichart. 2017. Replicability analysis for natural language processing: Testing significance with mul- tiple datasets. Transactions of the Association for Computational Linguistics, 5:471-486.
- Rotem Dror, Gili Baumer, Segev Shlomov, and Roi Re- ichart. 2018. The hitchhiker's guide to testing statis- tical significance in natural language processing. In Proceedings of the 56th Annual Meeting of the As- sociation for Computational Linguistics (Volume 1: Long Papers), pages 1383-1392.
- Allyson Ettinger, Naomi Feldman, Philip Resnik, and Colin Phillips. 2016. Modeling N400 amplitude us- ing vector space models of word representation. In CogSci.
- Stefan L Frank. 2017. Word embedding distance does not predict word reading time.
- Stefan L Frank, Irene Fernandez Monsalve, Robin L Thompson, and Gabriella Vigliocco. 2013. Read- ing time data for evaluating broad-coverage models of english sentence processing. Behavior Research Methods, 45(4):1182-1190.
- Stefan L Frank, Leun J Otten, Giulia Galli, and Gabriella Vigliocco. 2015. The ERP response to the amount of information conveyed by words in sen- tences. Brain and language, 140:1-11.
- Stefan L Frank and Roel M Willems. 2017. Word predictability and semantic similarity show distinct patterns of brain activity during language compre- hension. Language, Cognition and Neuroscience, 32(9):1192-1203.
- Sahar Ghannay, Benoit Favre, Yannick Esteve, and Nathalie Camelin. 2016. Word embedding evalua- tion and combination. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pages 300-305.
- Anna Gladkova and Aleksandr Drozd. 2016. Intrinsic evaluations of word embeddings: What can we do better? In Proceedings of the 1st Workshop on Eval- uating Vector-Space Representations for NLP, pages 36-42.
- Liberty S Hamilton and Alexander G Huth. 2018. The revolution will not be controlled: Natural stimuli in speech neuroscience. Language, Cognition and Neuroscience, pages 1-10.
- Olaf Hauk and Friedemann Pulvermüller. 2004. Ef- fects of word length and frequency on the human event-related potential. Clinical Neurophysiology, 115(5):1090-1103.
- Nora Hollenstein, Jonathan Rotsztejn, Marius Troen- dle, Andreas Pedroni, Ce Zhang, and Nicolas Langer. 2018. ZuCo, a simultaneous EEG and eye- tracking resource for natural sentence reading. Sci- entific Data.
- Nora Hollenstein and Ce Zhang. 2019. Entity recog- nition at first sight: Improving NER with eye move- ment information. In NAACL.
- Alexander G Huth, Wendy A de Heer, Thomas L Grif- fiths, Frédéric E Theunissen, and Jack L Gallant. 2016. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453- 458.
- Marcel A Just and Patricia A Carpenter. 1980. A theory of reading: From eye fixations to comprehension. Psychological review, 87(4):329.
- Alan Kennedy, Robin Hill, and Joël Pynte. 2003. The Dundee corpus. In Proceedings of the 12th Euro- pean Conference on Eye Movement.
- AK Laurinavichyute, Irina A Sekerina, SV Alexeeva, and KA Bagdasaryan. 2017. Russian Sentence Cor- pus: Benchmark measures of eye movements in reading in Cyrillic.
- Alessandro Lopopolo, Stefan L Frank, Antal Van den Bosch, Annabel Nijhof, and Roel M Willems. 2018. The Narrative Brain Dataset (NBD), an fMRI dataset for the study of natural language processing in the brain. In LREC 2018 Workshop on Linguistic and Neuro-Cognitive Resources (LiNCR). LREC.
- Steven G Luke and Kiel Christianson. 2017. The Provo Corpus: A large eye-tracking corpus with predictability norms. Behavior Research Methods, pages 1-8.
- Rob van der Goot Malvina Nissim, Rik van Noord. 2019. Fair is better than sensational: Man is to doctor as woman is to doctor. arXiv preprint arXiv:1905.09866.
- Francis M Miezin, L Maccotta, JM Ollinger, SE Pe- tersen, and RL Buckner. 2000. Characterizing the hemodynamic response: effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing. Neuroimage, 11(6):735-759.
- Tomas Mikolov, Edouard Grave, Piotr Bojanowski, Christian Puhrsch, and Armand Joulin. 2018. Ad- vances in pre-training distributed word representa- tions. In Proceedings of the International Confer- ence on Language Resources and Evaluation (LREC 2018).
- Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- rado, and Jeff Dean. 2013. Distributed representa- tions of words and phrases and their compositional- ity. In Advances in neural information processing systems, pages 3111-3119.
- George A Miller and Christiane Fellbaum. 1992. Wordnet and the organization of lexical memory. In Intelligent tutoring systems for foreign language learning, pages 89-102. Springer.
- Abhijit Mishra, Diptesh Kanojia, and Pushpak Bhat- tacharyya. 2016. Predicting readers' sarcasm under- standability by modeling gaze behavior. In AAAI, pages 3747-3753.
- Abhijit Mishra, Diptesh Kanojia, Seema Nagar, Kuntal Dey, and Pushpak Bhattacharyya. 2017. Scanpath complexity: Modeling reading effort using gaze in- formation. In AAAI, pages 4429-4436.
- Tom M Mitchell, Svetlana V Shinkareva, Andrew Carl- son, Kai-Min Chang, Vicente L Malave, Robert A Mason, and Marcel Adam Just. 2008. Predicting human brain activity associated with the meanings of nouns. Science, 320(5880):1191-1195.
- Christoph Mulert. 2013. Simultaneous EEG and fMRI: towards the characterization of structure and dynam- ics of brain networks. Dialogues in clinical neuro- science, 15(3):381.
- Brian Murphy, Leila Wehbe, and Alona Fyshe. 2018. Decoding language from the brain. Language, cog- nition, and computational models, page 53.
- Neha Nayak, Gabor Angeli, and Christopher D Man- ning. 2016. Evaluating word embeddings using a representative suite of practical tasks. In Proceed- ings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pages 19-23.
- Jeffrey Pennington, Richard Socher, and Christo- pher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Lan- guage Processing, pages 1532-1543.
- Francisco Pereira, Bin Lou, Brianna Pritchett, Samuel Ritter, Samuel J Gershman, Nancy Kanwisher, Matthew Botvinick, and Evelina Fedorenko. 2018. Toward a universal decoder of linguistic meaning from brain activation. Nature communications, 9(1):963.
- Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. Deep contextualized word rep- resentations. In Proceedings of NAACL.
- Cathy J Price. 2012. A review and synthesis of the first 20 years of PET and fMRI studies of heard speech, spoken language and reading. Neuroimage, 62(2):816-847.
- Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Nat- ural Language Processing, pages 2383-2392.
- Keith Rayner. 1998. Eye movements in reading and information processing: 20 years of research. Psy- chological bulletin, 124(3):372.
- Joao António Rodrigues, Ruben Branco, João Silva, Chakaveh Saedi, and António Branco. 2018. Pre- dicting brain activation with WordNet embeddings. In Proceedings of the Eight Workshop on Cognitive Aspects of Computational Language Learning and Processing, pages 1-5.
- Anna Rogers, Shashwath Hosur Ananthakrishna, and Anna Rumshisky. 2018. What's in your embedding, and how it predicts task performance. In Proceed- ings of the 27th International Conference on Com- putational Linguistics, pages 2690-2703.
- Chakaveh Saedi, António Branco, João António Ro- drigues, and João Silva. 2018. WordNet embed- dings. In Proceedings of The Third Workshop on Representation Learning for NLP, pages 122-131.
- Dan Schwartz and Tom Mitchell. 2019. Understanding language-elicited EEG data by predicting it from a fine-tuned language model. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Hu- man Language Technologies, Volume 1 (Long and Short Papers), pages 43-57.
- Anders Søgaard. 2016. Evaluating word embeddings with fMRI and eye-tracking. In Proceedings of the 1st Workshop on Evaluating Vector-Space Represen- tations for NLP, pages 116-121.
- Erik F Tjong Kim Sang and Fien De Meulder. 2003. Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In Proceedings of the 7th Conference on Natural Lan- guage Learning, volume 4, pages 142-147.
- Leila Wehbe, Ashish Vaswani, Kevin Knight, and Tom Mitchell. 2014. Aligning context-based statistical models of language with brain activity during read- ing. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 233-243.