"Learning and Inference for Hierarchically Split PCFGs"
Abstract
Treebank parsing can be seen as the search for an optimally refined grammar consistent with a coarse training treebank. We describe a method in which a minimal grammar is hierarchically refined using EM to give accurate, compact grammars. The resulting grammars are extremely compact compared to other high-performance parsers, yet the parser gives the best published accuracies on several languages, as well as the best generative parsing numbers in English. In addition, we give an associated coarse-to-fine inference scheme which vastly improves inference time with no loss in test set accuracy.
References (10)
- Charniak, E., and Johnson, M. 2005. Coarse-to-Fine N-Best Pars- ing and MaxEnt Discriminative Reranking. In ACL'05.
- Chi, Z. 1999. Statistical properties of probabilistic context-free grammars. In Computational Linguistics.
- Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. Dissertation, U. of Penn.
- Corazza, A., and Satta, G. 2006. Cross-entropy and estimation of probabilistic context-free grammars. In HLT-NAACL '06.
- Klein, D., and Manning, C. 2003. Accurate unlexicalized parsing. In ACL '03, 423-430.
- Lease, M.; Charniak, E.; Johnson, M.; and McClosky, D. 2006. A look at parsing and its applications. In AAAI '06.
- Matsuzaki, T.; Miyao, Y.; and Tsujii, J. 2005. Probabilistic CFG with latent annotations. In ACL '05, 75-82.
- Pereira, F., and Schabes, Y. 1992. Inside-outside reestimation from partially bracketed corpora. In ACL '92.
- Petrov, S., and Klein, D. 2007. Improved inference for unlexical- ized parsing. In HLT-NAACL '07.
- Petrov, S.; Barrett, L.; Thibaux, R.; and Klein, D. 2006. Learning accurate, compact, and interpretable tree annotation. In ACL '06.