BF++: a language for general-purpose neural program synthesis
2021, ArXiv
Abstract
Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neural models, where it is often difficult to incorporate expert knowledge into the models or let experts review and validate the learned decision mechanisms. Knowledge-insertion and model review are important requirements in many applications involving human health and safety. One way to bridge the gap between data and knowledge driven systems is program synthesis: replacing a neural network that outputs decisions with one that generates decision-making code in some programming language. We propose a new programming language, BF++, designed specifically for neural program synthesis in a Partially Observable Markov Decision Process (POMDP) setting and generate programs for a number of standard OpenAI Gym benchmarks. Source code is available at https://github.com/vadim0x60/cibi
Key takeaways
AI
AI
- BF++ enables program synthesis for decision-making in Partially Observable Markov Decision Processes (POMDPs).
- Incorporating expert knowledge improves the performance of programs generated through BF++.
- The language features 22 commands, including non-blocking action operators and a virtual comma for infinite loops.
- Experimental results demonstrate functional solutions comparable to deep learning methods in OpenAI Gym benchmarks.
- Future work includes developing translation mechanisms to other programming languages and applying BF++ in healthcare.
References (44)
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioan- nis Antonoglou, Aja Huang, Arthur Guez, Thomas Hu- bert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George Van Den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of Go without human knowledge. Na- ture, 550(7676):354-359, oct 2017. ISSN 14764687. doi: 10.1038/nature24270.
- OpenAI, Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław De ¸biak, Christy Denni- son, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique Pondé de Oliveira Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, Jie Tang, Filip Wolski, and Susan Zhang. Dota 2 with Large Scale Deep Reinforcement Learning. 2019. URL https://arxiv.org/abs/1912.06680.
- J. Andrew Bagnell. Reinforcement Learning in Robotics: A Survey. In Springer Tracts in Advanced Robotics, volume 97, pages 9-67. 2014. doi: 10.1007/ 978-3-319-03194-1_2.
- Zhenpeng Zhou, Xiaocheng Li, and Richard N Zare. Op- timizing Chemical Reactions with Deep Reinforcement Learning. ACS Central Science, 3(12):1337-1344, 2017. ISSN 23747951. doi: 10.1021/acscentsci.7b00492. URL https://pubs.acs.org/sharingguidelines.
- I Arel, C Liu, T Urbanik, and A G Kohls. Reinforcement learning-based multi-agent system for network traffic sig- nal control. 2009. doi: 10.1049/iet-its.2009.0070. URL www.ietdl.org.
- Chao Yu, Jiming Liu, and Shamim Nemati. Reinforce- ment learning in healthcare: a survey. arXiv preprint arXiv:1908.08796, 2019.
- K J Åström. Optimal control of Markov processes with incomplete state information. Journal of Mathematical Analysis and Applications, 10(1):174-205, 1965. ISSN 0022-247X. doi: https://doi.org/10.1016/0022-247X(65) 90154-X. URL http://www.sciencedirect.com/science/ article/pii/0022247X6590154X.
- Jr Kramer, J David R. Partially Observable Markov Pro- cesses., 1964.
- Richard S Sutton and Andrew G Barto. Reinforcement Learning: An Introduction, Second edition in progress, volume 3. 2017. doi: 10.1016/S1364-6613(99)01331-5.
- Leslie Pack Kaelbling, Michael L Littman, and Andrew W Moore. Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237-285, 1996.
- Seyed Sajad Mousavi, Michael Schukat, and Enda Howley. Deep Reinforcement Learning: An Overview. In Lecture Notes in Networks and Systems, volume 16, pages 426- 440. 2018. doi: 10.1007/978-3-319-56991-8_32. URL https://arxiv.org/abs/.
- Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, and Swarat Chaudhuri. Programmatically interpretable reinforcement learning. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th Interna- tional Conference on Machine Learning, volume 80 of Pro- ceedings of Machine Learning Research, pages 5045-5054, Stockholmsmässan, Stockholm Sweden, 10-15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/verma18a. html.
- Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. Program synthesis. Foundations and Trends in Program- ming Languages, 4(1-2):1-119, 2017. ISSN 23251131. doi: 10.1561/2500000010. URL www.nowpublishers. com;.
- Pengcheng Yin, Bowen Deng, Edgar Chen, Bogdan Vasilescu, and Graham Neubig. Learning to mine aligned code and natural language pairs from stack overflow. In International Conference on Mining Software Reposito- ries, MSR, pages 476-486. ACM, 2018. doi: https: //doi.org/10.1145/3196398.3196408.
- Xiaojun Xu, Chang Liu, and Dawn Song. SQLNet: Gener- ating Structured Queries From Natural Language Without Reinforcement Learning. Technical report.
- Neel Kant. Recent Advances in Neural Program Synthesis. 2018. URL http://arxiv.org/abs/1802.02353.
- Oleksandr Polozov and Sumit Gulwani. Flashmeta: A framework for inductive program synthesis. In Proceed- ings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2015, page 107-126, New York, NY, USA, 2015. Association for Computing Ma- chinery. ISBN 9781450336895. doi: 10.1145/2814270. 2814310. URL https://doi.org/10.1145/2814270.2814310.
- Ashwin K Vijayakumar, Dhruv Batra, Abhishek Mohta, Prateek Jain, Oleksandr Polozov, and Sumit Gulwani. Neural-guided deductive search for real-time program syn- thesis from examples. In 6th International Conference on Learning Representations, ICLR 2018 -Conference Track Proceedings, 2018. URL https://microsoft.github.io/prose/ impact/.
- Richard Shin, Neel Kant, Kavi Gupta, Christopher Ben- der, Brandon Trabucco, Rishabh Singh, and Dawn Song. Synthetic datasets for neural program synthesis. Technical report, 2019.
- Wojciech Zaremba, Ilya Sutskever, and Google Brain. Re- inforcement Learning Neural Turing Machines-revised. Technical report. URL https://github.com/ilyasu123/rlntm.
- Jason Weston, Sumit Chopra, and Antoine Bordes. Mem- ory networks. In 3rd International Conference on Learning Representations, ICLR 2015 -Conference Track Proceed- ings, oct 2015. URL http://arxiv.org/abs/1410.3916.
- Karol Kurach, Marcin Andrychowicz, and Ilya Sutskever. Neural random-access machines. In 4th International Con- ference on Learning Representations, ICLR 2016 -Confer- ence Track Proceedings, 2016.
- Alexander L Gaunt, Marc Brockschmidt, Rishabh Singh, Nate Kushman, Pushmeet Kohli, Jonathan Taylor, and Daniel Tarlow. TerpreT: A Probabilistic Programming Language for Program Induction. Technical report.
- Daniel A. Abolafia, Mohammad Norouzi, Jonathan Shen, Rui Zhao, and Quoc V. Le. Neural Program Synthesis with Priority Queue Training. 2018. URL http://arxiv.org/abs/ 1801.03526.
- U. Muller. Brainfuck -an eight-instruction turing- complete programming language. Available at the inter- net address http://en.wikipedia.org/wiki/Brainfuck, 1993. URL http://en.wikipedia.org/wiki/Brainfuck.
- Julie D Allen, Deborah Anderson, Joe Becker, Richard Cook, Mark Davis, Peter Edberg, Michael Everson, As- mus Freytag, Laurentiu Iancu, Richard Ishida, et al. The unicode standard. Mountain view, CA, 2012.
- A M Turing. On computable numbers, with an application to the entscheidungsproblem. a correction. Proceedings of the London Mathematical Society, s2-43(1):544-546, 1938. ISSN 1460244X. doi: 10.1112/plms/s2-43.6.544.
- Ahmed Touati, Adrien Ali Taiga, and Marc G Bellemare. Zooming for efficient model-free reinforcement learning in metric spaces. arXiv preprint arXiv:2003.04069, 2020.
- Mats Linander. control flow in brainfuck | matslina. Avail- able at the internet address http://calmerthanyouare. org/2016/01/14/control-flow-in-brainfuck.html, 2016. URL http://calmerthanyouare.org/2016/01/ 14/control-flow-in-brainfuck.html.
- M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Re- search, 47:253-279, jun 2013.
- Daiki Kimura. Daqn: Deep auto-encoder and q-network. arXiv preprint arXiv:1806.00630, 2018.
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Woj- ciech Zaremba Openai. OpenAI Gym. Technical report.
- A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cyber- netics, SMC-13(5):834-846, Sep. 1983. ISSN 2168-2909. doi: 10.1109/TSMC.1983.6313077.
- Andrew William Moore. Efficient memory-based learning for robot control. Technical report, 1990.
- Thomas G. Dietterich. Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.
- Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm. 1999.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Ma- chine learning, 8(3-4):229-256, 1992.
- Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning lecture 6a overview of mini-batch gradient descent.
- vadim0x60/evestop: Early stopping with exponential vari- ance elmination. https://github.com/vadim0x60/evestop. (Accessed on 01/20/2021).
- Wikipedia contributors. Bytecode -Wikipedia, the free encyclopedia, 2020. URL https://en.wikipedia.org/w/index. php?title=Bytecode&oldid=995026385. [Online; accessed 21-January-2021].
- Riccardo Poli, William B Langdon, Nicholas F McPhee, and John R Koza. A field guide to genetic programming. Lulu. com, 2008.
- P. A. Keane and E. J. Topol. With an eye to AI and au- tonomous diagnosis. NPJ Digit Med, 1:40, 2018. [PubMed Central:PMC6550235] [DOI:10.1038/s41746-018-0048- y] [PubMed:29618526].
- Kyle Richardson, Sina Zarrieß, and Jonas Kuhn. The code2text challenge: Text generation in source code li- braries. CoRR, abs/1708.00098, 2017. URL http://arxiv. org/abs/1708.00098.
- A. LeClair, S. Jiang, and C. McMillan. A neural model for generating natural language summaries of program subrou- tines. In 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), pages 795-806, 2019. doi: 10.1109/ICSE.2019.00087.