Fast Statistical Grammar Induction
Abstract
The statistical induction of context free grammars from bracketed corpora with the Inside Outside Algorithm has often inspired researchers, but the computational complexity has made it impossible to generate a large scale grammar. The method we suggest achieves the same results as earlier research, but at a much smaller expense in computer time. We explain the modifications needed to the algorithm, give results of experiments and compare these to results reported in other literature.
References (11)
- Baker, J. K. 1979. Trainable grammars for speech recognition. Speech Com- munication Papers for the 97th Meeting of the Acoustical Society of Amer- ica, pages 547-550.
- Black, E., R. Garside, and G. Leech. 1993. Statistically-Driven Computer Grammars of English: The IBM/Lancaster Approach. Rodopi.
- Briscoe, T. and N. Waegner. 1992. Robust stochastic parsing using the inside- outside algorithm. In Workshop Notes, Statistically-Based NLP Tech- niques, AAAI, pages 33-41.
- Coffins, M. J. 1996. A new statistical parser based on bigram lexical depen- dencies. In Proceedings of the nth Annual Meeting of the Association for Computational Linguistics, pages 184-191.
- Fujisaki, T., F. Jelinek, J. Co cke, E. Black, and T. Nishino. 1989. A prob- abilistic method for sentence disambiguation. In Proceedings of the 1st International Workshop on Parsing Technologies, pages 105-114.
- Hogenhout, W. R. and Y. Matsumoto. 1996. Training stochastical grammars on semantical categories. In Ellen Riloff Stefan Wermter and Gabriele Scheler, editors, Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. Springer, pages 160-172.
- Lafi, K. and S. J. Young. 1990. The estimation of stochastic context-free grammars using the Inside-Outside Algorithm. Computer Speech and Lan- guage, 4:35-56.
- Magerman, D. M. 1995. Statistical decision-tree models for parsing. In Pro- ceedings of the 33d Annual Meeting of the Association for Computational Linguistics, pages 276-283.
- Pereira, F. and Y. Schabes. 1992. Inside Outside reestimation from partially bracketed corpora. In Proceedings of the 30th Annual Meeting of the As- sociation for Computational Linguistics, pages 128-135.
- Schabes, Y., M. Roth, and R. Osborne. 1993. Parsing the wall street journal with the inside-outside algorithm. In Proceedings of the Sixth Conference of the European Chapter of the Association for Computational Linguistics, pages 341-347.
- Sharman, R., F. Jelinek, and R. Mercer. 1990. Generating a grammar for statistical training. In Proceedings of the DARPA Speech and Natural Lan- guage Workshop, pages 267-274.