Unsupervised Statistical Learning of Context-free Grammar
2020, Proceedings of the 12th International Conference on Agents and Artificial Intelligence
https://doi.org/10.5220/0009383604310438Abstract
In this paper, we address the problem of inducing (weighted) context-free grammar (WCFG) on data given. The induction is performed by using a new model of grammatical inference, i.e., weighted Grammar-based Classifier System (wGCS). wGCS derives from learning classifier systems and searches grammar structure using a genetic algorithm and covering. Weights of rules are estimated by using a novelty Inside-Outside Contrastive Estimation algorithm. The proposed method employs direct negative evidence and learns WCFG both form positive and negative samples. Results of experiments on three synthetic context-free languages show that wGCS is competitive with other statistical-based method for unsupervised CFG learning.
References (31)
- Adriaans, P. and Vervoort, M. (2002). The EMILE 4.1 grammar induction toolbox. In International Col- loquium on Grammatical Inference, pages 293-295. Springer.
- Adriaans, P. W. (1992). Language learning from a catego- rial perspective. PhD thesis, Universiteit van Amster- dam.
- Baker, J. K. (1979). Trainable grammars for speech recog- nition. The Journal of the Acoustical Society of Amer- ica, 65(S1):S132-S132.
- Clark, A. and Lappin, S. (2010). Unsupervised learning and grammar induction. The Handbook of Computational Linguistics and Natural Language Processing, 57.
- de la Higuera, C. (2010). Grammatical Inference: Learn- ing Automata and Grammars. Cambridge University Press.
- D'Ulizia, A., Ferri, F., and Grifoni, P. (2011). A survey of grammatical inference methods for natural language learning. Artificial Intelligence Review, 36(1):1-27.
- Gold, E. M. (1967). Language identification in the limit. Information and control, 10(5):447-474.
- Heinz, J., De la Higuera, C., and Van Zaanen, M. (2015). Grammatical inference for computational linguistics. Synthesis Lectures on Human Language Technologies, 8(4):1-139.
- Hogenhout, W. R. and Matsumoto, Y. (1998). A fast method for statistical grammar induction. Natural Language Engineering, 4(3):191-209.
- Hopcroft, J. E., Motwani, R., and Ullman, J. D. (2001). In- troduction to automata theory, languages, and compu- tation. Acm Sigact News, 32(1):60-65.
- Horning, J. J. (1969). A study of grammatical inference. Technical report, Stanford Univ Calif Dept of Com- puter Science.
- Johnson, M., Griffiths, T., and Goldwater, S. (2007). Bayesian inference for pcfgs via markov chain monte carlo. In Human Language Technologies 2007: The Conference of the North American Chapter of the As- sociation for Computational Linguistics; Proceedings of the Main Conference, pages 139-146.
- Keller, B. and Lutz, R. (1997). Evolving stochastic context- free grammars from examples using a minimum de- scription length principle. In Workshop on Automatic Induction, Grammatical Inference and Language Ac- quisition.
- Lari, K. and Young, S. J. (1990). The estimation of stochas- tic context-free grammars using the inside-outside al- gorithm. Computer speech & language, 4(1):35-56.
- Nakamura, K. (2003). Incremental learning of context free grammars by extended inductive CYK algorithm. In Proceedings of the 2003rd European Conference on Learning Context-Free Grammars, pages 53-64. Ruder Boskovic Institute.
- Petasis, G., Paliouras, G., Karkaletsis, V., Halatsis, C., and Spyropoulos, C. D. (2004). e-GRIDS: Computation- ally efficient gramatical inference from positive exam- ples. Grammars, 7:69-110.
- Salkind, N. J. (2010). Encyclopedia of Research Design. SAGE Publications, Inc.
- Smith, N. A. and Eisner, J. (2005a). Contrastive estimation: Training log-linear models on unlabeled data. In Pro- ceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 354-362. Asso- ciation for Computational Linguistics.
- Smith, N. A. and Eisner, J. (2005b). Guiding unsupervised grammar induction using contrastive estimation. In Proc. of IJCAI Workshop on Grammatical Inference Applications, pages 73-82.
- Smith, N. A. and Johnson, M. (2007). Weighted and proba- bilistic context-free grammars are equally expressive. Computational Linguistics, 33(4):477-491.
- Solan, Z., Horn, D., Ruppin, E., and Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Sciences, 102(33):11629-11634.
- Stolcke, A. and Omohundro, S. (1994). Inducing prob- abilistic grammars by bayesian model merging. In International Colloquium on Grammatical Inference, pages 106-118. Springer.
- Unold, O. (2005). Context-free grammar induction with grammar-based classifier system. Archives of Control Sciences, 15(4):681-690.
- Unold, O. (2008). Grammar-based classifier system: a uni- versal tool for grammatical inference. WSEAS Trans- actions on Computers, 7(10):1584-1593.
- Unold, O. (2012). Fuzzy grammar-based prediction of amy- loidogenic regions. In International Conference on Grammatical Inference, pages 210-219.
- Unold, O. (2019). jGCS. https://github.com/ounold/jGCS.
- Unold, O. and Gabor, M. (2019a). How implicit neg- ative evidence improve weighted context-free gram- mar induction. In International Conference on Artifi- cial Intelligence and Soft Computing, pages 595-606. Springer.
- Unold, O. and Gabor, M. (2019b). Weighted context-free grammar induction-a preliminary report. In PP- RAI 2019, pages 319-322. ISBN 978-83-943803-2-8.
- Urbanowicz, R. J. and Moore, J. H. (2009). Learning clas- sifier systems: a complete introduction, review, and roadmap. Journal of Artificial Evolution and Applica- tions, 2009:1.
- Wieczorek, W. (2010). A local search algorithm for gram- matical inference. In Jose M. Sempere, P. G., edi- tor, Grammatical Inference: Theoretical Results and Applications: 10th International Colloquium, ICGI 2010, Valencia, Spain, September 2010. Proceedings, volume 6339 of Lecture Notes in Computer Science, pages 217-229, Berlin, Heidelberg. Springer-Verlag.
- Wieczorek, W. and Unold, O. (2016). Use of a novel gram- matical inference approach in classification of amy- loidogenic hexapeptides. Computational and mathe- matical methods in medicine, 2016.