Dual Monte Carlo Tree Search
2021, ArXiv
Abstract
AlphaZero, using a combination of Deep Neural Networks and Monte Carlo Tree Search (MCTS), has successfully trained reinforcement learning agents in a tabula-rasa way. The neural MCTS algorithm has been successful in finding nearoptimal strategies for games through self-play. However, the AlphaZero algorithm has a significant drawback; it takes a long time to converge and requires high computational power due to complex neural networks for solving games like Chess, Go, Shogi, etc. Owing to this, it is very difficult to pursue neural MCTS research without cutting-edge hardware, which is a roadblock for many aspiring neural MCTS researchers. In this paper, we propose a new neural MCTS algorithm, called Dual MCTS, which helps overcome these drawbacks. Dual MCTS uses two different search trees, a single deep neural network, and a new update technique for the search trees using a combination of the PUCB, a sliding-window, and the -greedy algorithm. This technique is applicable to any MCT...
References (17)
- Elo, A. E. The rating of chessplayers, past and present. 1978.
- Frankl, P. and Tokushige, N. The game of n-times nim. Discrete Math., 260(1-3):205-209, January 2003. ISSN 0012-365X. doi: 10.1016/S0012-365X(02)00667-2.
- Gelly and Silver, D. Monte carlo tree search and rapid action value estimation in computer go. Artificial Intelligence, 175, 2011.
- Grill, J.-B., Altché, F., Tang, Y., Hubert, T., Valko, M., Antonoglou, I., and Munos, R. Monte-carlo tree search as regularized policy optimization. ArXiv, abs/2007.12509, 2020.
- Hill, G. and Kemp, S. M. Connect 4: A novel paradigm to elicit positive and negative insight and search problem solving. 9(1755), 2018.
- Hodges, W. and Väänänen, J. Logic and Games. In Zalta, E. N. (ed.), The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, fall 2019 edition, 2019.
- Kleinberg, J. and Tardos, E. Algorithm Design. Addison- Wesley Longman Publishing Co., Inc., USA, 2005. ISBN 0321295358.
- Lan, L.-C., Li, W., Wei, T.-H., and Wu, I. Multiple policy value monte carlo tree search. In IJCAI, 2019.
- Lanctot, M., Winands, M. H., Pepels, T., and Sturtevant, N. R. Monte carlo tree search with heuristic evaluations using implicit minimax backups. IEEE Conference on Computational Intelligence and Games, pp. 1-8, 2014.
- Lanctot, M., Lockhart, E., Lespiau, J.-B., Zambaldi, V., Upadhyay, S., Pérolat, J., Srinivasan, S., Timbers, F., Tuyls, K., Omidshafiei, S., Hennes, D., Morrill, D., Muller, P., Ewalds, T., Faulkner, R., Kramár, J., Vylder, B. D., Saeta, B., Bradbury, J., Ding, D., Borgeaud, S., Lai, M., Schrittwieser, J., Anthony, T., Hughes, E., Dani- helka, I., and Ryan-Davis, J. Openspiel: A framework for reinforcement learning in games. 2019.
- Omidshafiei, S., Papadimitriou, C., Piliouras, G., Tuyls, K., Rowland, M., Lespiau, J.-B., Czarnecki, W. M., Lanc- tot, M., Perolat, J., and Munos, R. α-rank: Multi-agent evaluation by evolution. Nature, 1038, Jul 2019.
- Rosin, C. D. Multi-armed bandits with episode context. volume 61, pp. 203-230, 2011.
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Driessche, G. V. D., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., and Lanctot, M. Mastering the game of go with deep neural networks and tree search. Nature, 529, Jul 2016a.
- Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., and Hassabis, D. Mastering the game of go with deep neural networks and tree search. Nature, 529:484,503, 2016b.
- Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T., and Hassabis, D. Mastering the game of go without human knowledge. Nature, 550:354, Oct 2017.
- Xu, R. and Lieberherr, K. Learning self-game-play agents for combinatorial optimization problems. AAMAS, Pro- ceedings of the 18th International Conference on Au- tonomous Agents and MultiAgent Systems, pp. 2276- 2278, 2019.
- Xu, R. and Lieberherr, K. Learning self-play agents for combinatorial optimization problems. The Knowl- edge Engineering Review, 35:e11, 2020. doi: 10.1017/ S026988892000020X.