Academia.eduAcademia.edu

Outline

Adaptive Tug-of-war Model for Two-armed Bandit Problem

https://doi.org/10.34385/PROC.45.A3L-B1

Abstract

The "tug-of-war (TOW) model" proposed in our previous studies ] is a unique dynamical system inspired by the photoavoidance behavior of a single-celled amoeba of the true slime mold Physarum polycephalum. The TOW model is applied to solving the "multi-armed bandit problem," a problem of finding the most rewarding one from multiple options as accurately and speedy as possible. We showed that the model exhibits better performances compared with other well-known algorithms. However, in order to maximize its performance, the TOW model is required an optimized parameter w. In this study, we propose a new TOW model which adaptively produces the estimates to determine w in its own way and thus has no parameter. We show that in some asymmetric problems the new model is more efficient than the UCB1tuned algorithm , which is known as the best algorithm.

References (14)

  1. S. -J. Kim, M. Aono, M. Hara, UC2009, LNCS 5715, Springer, p.289, 2009.
  2. S. -J. Kim, M. Aono, M. Hara, UC2010, LNCS 6079, Springer, pp.69-80, 2010.
  3. S. -J. Kim, M. Aono, M. Hara, BioSystems vol.101, pp.29-36, 2010.
  4. S. -J. Kim, M. Aono, M. Hara, Proc. of NOLTA2010, pp.520-523, 2010.
  5. P. Auer, N. Cesa-Bianchi, P. Fischer, Machine Learn- ing vol.47, pp.235-256, 2002.
  6. M. Dorigo, L. M. Gambardella, Artifical Life Vol.5 No. 2, pp.137-172, 1999.
  7. D. Karaboga, Technical Report-TR06, Erciyes Uni- versity, 2005.
  8. M. Aono, Y. Hirata, M. Hara, K. Aihara, New Gener- ation Computing vol.27, pp.129-157, 2009.
  9. M. Aono, Y. Hirata, M. Hara, K. Aihara, UC2009, LNCS 5715, Springer, pp.56-69, 2009.
  10. L. Kocsis, C. Szepesvári, ECML2006, LNAI 4212, Springer, pp.282-293, 2006.
  11. S. Gelly, Y. Wang, R. Munos, O. Teytaud, RR-6062- INRIA, pp.1-19, 2006.
  12. L. Lai, H. Jiang, H. V. Poor, Proc. of IEEE 42nd Asilo- mar Conference on Signals, System and Computers, pp.98-102, 2008.
  13. S. Shinohara, M. Nakano, Proc. of Shinka Keizai Gakkai [in Japanese], 2007.
  14. E. Nameda, S. -J. Kim, M. Aono, M. Hara, "On constant-regret strategies in bandit problem," (in preparation).