Papers by Slav Orlinov Petrov
A system that can observe events, learn, and participate just as a child would in an unknown scen... more A system that can observe events, learn, and participate just as a child would in an unknown scenario is the Holy Grail of AI.
The intersection of tree transducer-based translation models with n-gram language models results ... more The intersection of tree transducer-based translation models with n-gram language models results in huge dynamic programs for machine translation decoding. We propose a multipass, coarse-to-fine approach in which the language model complexity is incrementally introduced. In contrast to previous orderbased bigram-to-trigram approaches, we focus on encoding-based methods, which use a clustered encoding of the target language. Across various encoding schemes, and for multiple language pairs, we show speed-ups of up to 50 times over single-pass decoding while improving BLEU score. Moreover, our entire decoding cascade for trigram language models is faster than the corresponding bigram pass alone of a bigram-to-trigram decoder.
We present a discriminative, latent variable approach to syntactic parsing in which rules exist a... more We present a discriminative, latent variable approach to syntactic parsing in which rules exist at multiple scales of refinement. The model is formally a latent variable CRF grammar over trees, learned by iteratively splitting grammar productions (not categories). Different regions of the grammar are refined to different degrees, yielding grammars which are three orders of magnitude smaller than the single-scale baseline and 20 times smaller than the split-and-merge grammars of . In addition, our discriminative approach integrally admits features beyond local tree configurations. We present a multiscale training method along with an efficient CKY-style dynamic program. On a variety of domains and languages, this method produces the best published parsing accuracies with the smallest reported grammars.

To enable downstream language processing, automatic speech recognition output must be segmented i... more To enable downstream language processing, automatic speech recognition output must be segmented into its individual sentences. Previous sentence segmentation systems have typically been very local, using low-level prosodic and lexical features to independently decide whether or not to segment at each word boundary position. In this work, we leverage global syntactic information from a syntactic parser, which is better able to capture long distance dependencies. While some previous work has included syntactic features, ours is the first to do so in a tractable, lattice-based way, which is crucial for scaling up to long-sentence contexts. Specifically, an initial hypothesis lattice is constructed using local features. Candidate sentences are then assigned syntactic language model scores. These global syntactic scores are combined with local low-level scores in a log-linear model. The resulting system significantly outperforms the most popular long-span model for sentence segmentation (the hidden event language model) on both reference text and automatic speech recognizer output from news broadcasts.
We describe experiments on learning latent variable grammars for various German treebanks, using ... more We describe experiments on learning latent variable grammars for various German treebanks, using a language-agnostic statistical approach. In our method, a minimal initial grammar is hierarchically refined using an adaptive split-and-merge EM procedure, giving compact, accurate grammars. The learning procedure directly maximizes the likelihood of the training treebank, without the use of any language specific or linguistically constrained features. Nonetheless, the resulting grammars encode many linguistically interpretable patterns and give the best published parsing accuracies on three German treebanks.
We demonstrate that log-linear grammars with latent variables can be practically trained using di... more We demonstrate that log-linear grammars with latent variables can be practically trained using discriminative methods. Central to efficient discriminative training is a hierarchical pruning procedure which allows feature expectations to be efficiently approximated in a gradient-based procedure. We compare L 1 and L 2 regularization and show that L 1 regularization is superior, requiring fewer iterations to converge, and yielding sparser solutions. On full-scale treebank parsing experiments, the discriminative latent models outperform both the comparable generative latent models as well as the discriminative non-latent baselines.
Repeatedly split each category in two and retrain the grammar, initializing with the previous gra... more Repeatedly split each category in two and retrain the grammar, initializing with the previous grammar. Parsing performance (F1-score) 75 % 77 % 79 % 81 % 83 % 85 % 87 % 89 % 91 %
We present a maximally streamlined approach to learning HMM-based acoustic models for automatic s... more We present a maximally streamlined approach to learning HMM-based acoustic models for automatic speech recognition. In our approach, an initial monophone HMM is iteratively refined using a split-merge EM procedure which makes no assumptions about subphone structure or context-dependent structure, and which uses only a single Gaussian per HMM state. Despite the much simplified training process, our acoustic model achieves state-of-the-art results on phone classification (where it outperforms almost all other methods) and competitive performance on phone recognition (where it outperforms standard CD triphone / subphone / GMM approaches). We also present an analysis of what is and is not learned by our system.
We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet ... more We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to full-scale parsing applications by demonstrating its effectiveness in learning state-split grammars.
Treebank parsing can be seen as the search for an optimally refined grammar consistent with a coa... more Treebank parsing can be seen as the search for an optimally refined grammar consistent with a coarse training treebank. We describe a method in which a minimal grammar is hierarchically refined using EM to give accurate, compact grammars. The resulting grammars are extremely compact compared to other high-performance parsers, yet the parser gives the best published accuracies on several languages, as well as the best generative parsing numbers in English. In addition, we give an associated coarse-to-fine inference scheme which vastly improves inference time with no loss in test set accuracy.
We present several improvements to unlexicalized parsing with hierarchically state-split PCFGs. F... more We present several improvements to unlexicalized parsing with hierarchically state-split PCFGs. First, we present a novel coarse-to-fine method in which a grammar's own hierarchical projections are used for incremental pruning, including a method for efficiently computing projections of a grammar without a treebank. In our experiments, hierarchical pruning greatly accelerates parsing with no loss in empirical accuracy. Second, we compare various inference procedures for state-split PCFGs from the standpoint of risk minimization, paying particular attention to their practical tradeoffs. Finally, we present multilingual experiments which show that parsing with hierarchical state-splitting is fast and accurate in multiple languages and domains, even without any language-specific tuning.
We present an automatic approach to tree annotation in which basic nonterminal symbols are altern... more We present an automatic approach to tree annotation in which basic nonterminal symbols are alternately split and merged to maximize the likelihood of a training treebank. Starting with a simple Xbar grammar, we learn a new grammar whose nonterminals are subsymbols of the original nonterminals. In contrast with previous work, we are able to split various terminals to different degrees, as appropriate to the actual complexity in the data. Our grammars automatically learn the kinds of linguistic distinctions exhibited in previous work on manual tree annotation. On the other hand, our grammars are much more compact and substantially more accurate than previous work on automatic annotation. Despite its simplicity, our best grammar achieves an F1 of 90.2% on the Penn Treebank, higher than fully lexicalized systems.
While most work on parsing with PCFGs has focused on local correlations between tree configuratio... more While most work on parsing with PCFGs has focused on local correlations between tree configurations, we attempt to model non-local correlations using a finite mixture of PCFGs. A mixture grammar fit with the EM algorithm shows improvement over a single PCFG, both in parsing accuracy and in test data likelihood. We argue that this improvement comes from the learning of specialized grammars that capture non-local correlations.
VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 ... more VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 CD billion Slav Petrov, Leon Barrett and Dan Klein Non-Local Modeling with a Mixture of PCFGs Empirical Motivation VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 CD billion Verb Phrase Expansion: capture with lexicalization. VP VBD increased NP CD 11 NN % PP TO to NP QP # # CD 2.5 CD billion PP IN from NP QP # # CD 2.25 CD billion
Computers fail to track these in fast video, but sleight of hand fools humans as well: what happe... more Computers fail to track these in fast video, but sleight of hand fools humans as well: what happens too quickly we just cannot see. We show a 3D tracker for these types of motions that relies on the recognition of familiar configurations in 2D images (classification), and fills the gaps in-between (interpolation). We illustrate this idea with experiments on hand motions similar to finger spelling. The penalty for a recognition failure is often small: if two configurations are confused, they are often similar to each other, and the illusion works well enough, for instance, to drive a graphics animation of the moving hand. We contribute advances in both feature design and classifier training: our image features are invariant to image scale, translation, and rotation, and we propose a classification method that combines VQPCA with discrimination trees.
"Computer vision, sensor fusion, and behavior control for soccer playing robots"
Recent research has demonstrated that PCFGs with latent annotations are an effective way to provi... more Recent research has demonstrated that PCFGs with latent annotations are an effective way to provide automated increases in parsing accuracy. We feel that they have more potential than the literature has so far demonstrated, and we further speculate that they could also provide transfer between different corpora. In this paper, we describe our efforts and show that they do indeed provide a significant level of transfer.
In the context of the annual TRECVID challenge, this paper presents a comprehensive statistical f... more In the context of the annual TRECVID challenge, this paper presents a comprehensive statistical framework for classification of video shots. We first design and analyze a broad set of language and video features. While most teams in this challenge have not been able to improve their baseline language performance by incorporating visual information, we show that a strong vision system can be a major asset. By leveraging techniques that have been developed for visual object recognition we can detect categories for which language alone does not contain any information.
Uploads
Papers by Slav Orlinov Petrov