Papers by Guy Emerson
Across languages, multiple consecutive adjectives modifying a noun (e.g. "the big red dog&qu... more Across languages, multiple consecutive adjectives modifying a noun (e.g. "the big red dog") follow certain unmarked ordering rules. While explanatory accounts have been put forward, much of the work done in this area has relied primarily on the intuitive judgment of native speakers, rather than on corpus data. We present the first purely corpus-driven model of multi-lingual adjective ordering in the form of a latent-variable model that can accurately order adjectives across 24 different languages, even when the training and testing languages are different. We utilize this novel statistical model to provide strong converging evidence for the existence of universal, cross-linguistic, hierarchical adjective ordering tendencies.

A broad-coverage corpus such as the Hu-man Language Project envisioned by Ab-ney and Bird (2010) ... more A broad-coverage corpus such as the Hu-man Language Project envisioned by Ab-ney and Bird (2010) would be a powerful resource for the study of endangered lan-guages. Existing corpora are limited in the range of languages covered, in stan-dardisation, or in machine-readability. In this paper we present SeedLing, a seed corpus for the Human Language Project. We first survey existing efforts to compile cross-linguistic resources, then describe our own approach. To build the foundation text for a Universal Corpus, we crawl and clean texts from several web sources that contain data from a large number of lan-guages, and convert them into a standard-ised form consistent with the guidelines of Abney and Bird (2011). The result-ing corpus is more easily-accessible and machine-readable than any of the underly-ing data sources, and, with data from 1451 languages covering 105 language fami-lies, represents a significant base corpus for researchers to draw on and add to in the future. To demons...
Functional Distributional Semantics provides a linguistically interpretable framework for distrib... more Functional Distributional Semantics provides a linguistically interpretable framework for distributional semantics, by representing the meaning of a word as a function (a binary classifier), instead of a vector. However, the large number of latent variables means that inference is computationally expensive, and training a model is therefore slow to converge. In this paper, I introduce the Pixie Autoencoder, which augments the generative model of Functional Distributional Semantics with a graph-convolutional neural network to perform amortised variational inference. This allows the model to be trained more effectively, achieving better results on two tasks (semantic similarity in context and semantic composition), and outperforming BERT, a large pre-trained language model.
ArXiv, 2020
Functional Distributional Semantics provides a computationally tractable framework for learning t... more Functional Distributional Semantics provides a computationally tractable framework for learning truth-conditional semantics from a corpus. Previous work in this framework has provided a probabilistic version of first-order logic, recasting quantification as Bayesian inference. In this paper, I show how the previous formulation gives trivial truth values when a precise quantifier is used with vague predicates. I propose an improved account, avoiding this problem by treating a vague predicate as a distribution over precise predicates. I connect this account to recent work in the Rational Speech Acts framework on modelling generic quantification, and I extend this to modelling donkey sentences. Finally, I explain how the generic quantifier can be both pragmatically complex and yet computationally simpler than precise quantifiers.
Computational linguistics and grammar engineering
We discuss the relevance of HPSG for computational linguistics, and the relevance of computationa... more We discuss the relevance of HPSG for computational linguistics, and the relevance of computational linguistics for HPSG, including: the theoretical and computational infrastructure required to carry out computational studies with HPSG; computational resources developed within HPSG; how those resources are deployed, for both practical applications and linguistic research; and finally, a sampling of linguistic insights achieved through HPSG-based computational linguistic research.

Accurate parse ranking requires semantic information, since a sentence may have many candidate pa... more Accurate parse ranking requires semantic information, since a sentence may have many candidate parses involving common syntactic constructions. In this paper, we propose a probabilistic framework for incorporating distributional semantic information into a maximum entropy parser. Furthermore, to better deal with sparse data, we use a modified version of Latent Dirichlet Allocation to smooth the probability estimates. This LDA model generates pairs of lemmas, representing the two arguments of a semantic relation, and can be trained, in an unsupervised manner, on a corpus annotated with semantic dependencies. To evaluate our framework in isolation from the rest of a parser, we consider the special case of prepositional phrase attachment ambiguity. The results show that our semantically-motivated feature is effective in this case, and moreover, the LDA smoothing both produces semantically interpretable topics, and also improves performance over raw co-occurrence frequencies, demonstrat...
Chapter 25 Computational linguistics and grammar engineering
We discuss the relevance of HPSG for computational linguistics, and the relevance of computationa... more We discuss the relevance of HPSG for computational linguistics, and the relevance of computational linguistics for HPSG, including: the theoretical and computational infrastructure required to carry out computational studies with HPSG; computational resources developed within HPSG; how those resources are deployed, for both practical applications and linguistic research; and finally, a sampling of linguistic insights achieved through HPSG-based computational linguistic research.
SentiMerge: Official release

The aim of distributional semantics is to design computational techniques that can automatically ... more The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words from a body of text. As McNally (2017) and Lenci (2008, 2018) have argued, distributional representations can be used as surrogates for conceptual representations – but crucially, they can be calculated concretely. Used in this way, distributional data allows us to develop and test linguistic theories. The twin challenges are: how do we represent meaning, and how do we learn these representations? The current state of the art is to represent meanings as vectors – but vectors do not correspond to any traditional notion of meaning. In particular, there is no way to talk about truth, a crucial concept in logic and formal semantics. In this thesis, I develop a framework for distributional semantics which answers this challenge. The meaning of a word is not represented as a vector, but as a function, mapping entities (objects in the world) to probabilities of truth...
We present an analysis of multiple question fronting in a restricted variant of the HPSG formalis... more We present an analysis of multiple question fronting in a restricted variant of the HPSG formalism (DELPH-IN) where unification is the only natively defined operation. Analysing multiple fronting in this formalism is challenging, because it requires carefully handling list appends, something that HPSG analyses of question fronting heavily rely on. Our analysis uses the append list type to address this challenge. We focus the testing of our analysis on Russian, although we also integrate it into the Grammar Matrix customization system where it serves as a basis for cross-linguistic modeling. In this context, we discuss the relationship of our analysis to lexical threading and conclude that, while lexical threading has its advantages, modeling multiple extraction cross-linguistically is easier without the lexical threading assumption.
Incremental Beam Manipulation for Natural Language Generation
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
Investigating Cross-Linguistic Adjective Ordering Tendencies with a Latent-Variable Model
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Autoencoding Pixies: Amortised Variational Inference with Graph Convolutions for Functional Distributional Semantics
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Leveraging Sentence Similarity in Natural Language Generation: Improving Beam Search using Range Voting
Proceedings of the Fourth Workshop on Neural Generation and Translation
What are the Goals of Distributional Semantics?
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019)
Word embeddings are an essential component in a wide range of natural language processing applica... more Word embeddings are an essential component in a wide range of natural language processing applications. However, distributional semantic models are known to struggle when only a small number of context sentences are available. Several methods have been proposed to obtain higher-quality vectors for these words, leveraging both this context information and sometimes the word forms themselves through a hybrid approach. We show that the current tasks do not suffice to evaluate models that use word-form information, as such models can easily leverage word forms in the training data that are related to word forms in the test data. We introduce 3 new tasks, allowing for a more balanced comparison between models. Furthermore, we show that hyperparameters that have largely been ignored in previous work can consistently improve the performance of both baseline and advanced models, achieving a new state of the art on 4 out of 6 tasks.
Proceedings of the 13th International Conference on Computational Semantics - Long Papers
Distributional Semantic Models (DSMs) construct vector representations of word meanings based on ... more Distributional Semantic Models (DSMs) construct vector representations of word meanings based on their contexts. Typically, the contexts of a word are defined as its closest neighbours, but they can also be retrieved from its syntactic dependency relations. In this work, we propose a new dependencybased DSM. The novelty of our model lies in associating an independent meaning representation, a matrix, with each dependency-label. This allows it to capture specifics of the relations between words and contexts, leading to good performance on both intrinsic and extrinsic evaluation tasks. In addition to that, our model has an inherent ability to represent dependency chains as products of matrices which provides a straightforward way of handling further contexts of a word.
SeedLing: Building and Using a Seed corpus for the Human Language Project
Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages, 2014
Conference Presentations by Guy Emerson

Semantic composition remains an open problem for vector space models of semantics. In this paper,... more Semantic composition remains an open problem for vector space models of semantics. In this paper, we explain how the probabilistic graphical model used in the framework of Functional Distri-butional Semantics can be interpreted as a probabilistic version of model theory. Building on this, we explain how various semantic phenomena can be recast in terms of conditional probabilities in the graphical model. This connection between formal semantics and machine learning is helpful in both directions: it gives us an explicit mechanism for modelling context-dependent meanings (a challenge for formal semantics), and also gives us well-motivated techniques for composing distributed representations (a challenge for distributional semantics). We present results on two datasets that go beyond word similarity, showing how these semantically-motivated techniques improve on the performance of vector models.
Functional Distributional Semantics is a framework that aims to learn, from text, semantic repres... more Functional Distributional Semantics is a framework that aims to learn, from text, semantic representations which can be interpreted in terms of truth. Here we make two contributions to this framework. The first is to show how a type of logical inference can be performed by evaluating conditional probabilities. The second is to make these calculations tractable by means of a variational approximation. This approximation also enables faster convergence during training, allowing us to close the gap with state-of-the-art vector space models when evaluating on semantic similarity. We demonstrate promising performance on two tasks.
Uploads
Papers by Guy Emerson
Conference Presentations by Guy Emerson