An optimal algorithm for bandit convex optimization
2016, arXiv (Cornell University)
Abstract
We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first Õ( √ T )-regret algorithm for this setting based on a novel application of the ellipsoid method to online learning. This bound is known to be tight up to logarithmic factors. Our analysis introduces new tools in discrete convex geometry.
References (25)
- Rediet Abebe. Counting regions in hyperplane arrangements. Harvard College Math Review, 5.
- Jacob Abernethy, Elad Hazan, and Alexander Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In COLT, pages 263-274, 2008.
- Alekh Agarwal, Ofer Dekel, and Lin Xiao. Optimal algorithms for online convex optimization with multi-point bandit feedback. In COLT, pages 28-40, 2010.
- Alekh Agarwal, Dean P. Foster, Daniel Hsu, Sham M. Kakade, and Alexander Rakhlin. Stochastic convex optimization with bandit feedback. SIAM Journal on Optimization, 23(1):213-240, 2013.
- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM J. Comput., 32(1):48-77, January 2003.
- Baruch Awerbuch and Robert Kleinberg. Online linear optimization and adaptive routing. J. Comput. Syst. Sci., 74(1):97-114, 2008.
- Keith Ball. An elementary introduction to modern convex geometry. In Flavors of Geometry, pages 1-58. Univ. Press, 1997.
- Sébastien Bubeck and Nicolo Cesa-Bianchi. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends in Machine Learning, 5(1):1-122, 2012.
- Sébastien Bubeck, Nicolò Cesa-Bianchi, and Sham M. Kakade. Towards minimax policies for online linear optimization with bandit feedback. Journal of Machine Learning Research -Proceedings Track, 23:41.1-41.14, 2012.
- Sébastien Bubeck, Ofer Dekel, Tomer Koren, and Yuval Peres. Bandit convex optimization: \(\sqrt{T}\) regret in one dimension. In Proceedings of The 28th Conference on Learning Theory, COLT 2015, Paris, France, July 3-6, 2015, pages 266-278, 2015.
- Sébastien Bubeck and Ronen Eldan. Multi-scale exploration of convex functions and bandit convex optimization. CoRR, abs/1507.06580, 2015.
- Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to Derivative-Free Optimization, volume 8. Society for Industrial and Applied Mathematics, 2009.
- Varsha Dani, Thomas P. Hayes, and Sham Kakade. The price of bandit information for online optimiza- tion. In NIPS, 2007.
- Ofer Dekel, Ronen Eldan, and Tomer Koren. Bandit smooth convex optimization: Improving the bias- variance tradeoff. In Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015.
- Abraham Flaxman, Adam Tauman Kalai, and H. Brendan McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In SODA, pages 385-394, 2005.
- M. Grötschel, L. Lovász, and A. Schrijver. Geometric algorithms and combinatorial optimization. Algo- rithms and combinatorics. Springer-Verlag, 1993.
- Elad Hazan. DRAFT: Introduction to online convex optimimization. Foundations and Trends in Machine Learning, XX(XX):1-168, 2015.
- Elad Hazan and Zohar Karnin. Hard-margin active linear regression. In 31st International Conference on Machine Learning (ICML 2014), 2014.
- Elad Hazan and Kfir Y. Levy. Bandit convex optimization: Towards tight bounds. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 784-792, 2014.
- F. John. Extremum Problems with Inequalities as Subsidiary Conditions. In K. O. Friedrichs, O. E. Neugebauer, and J. J. Stoker, editors, Studies and Essays: Courant Anniversary Volume, pages 187-204. Wiley-Interscience, New York, 1948.
- Robert D Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. In NIPS, volume 17, pages 697-704, 2004.
- Eunji Lim and Peter W. Glynn. Consistency of multidimensional convex regression. Oper. Res., 60(1):196-208, January 2012.
- Hariharan Narayanan and Alexander Rakhlin. Random walk approach to regret minimization. In Ad- vances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Informa- tion Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada., pages 1777-1785, 2010.
- Ankan Saha and Ambuj Tewari. Improved regret guarantees for online smooth convex optimization with bandit feedback. In AISTATS, pages 636-642, 2011.
- Ohad Shamir. On the complexity of bandit and derivative-free stochastic convex optimization. In Confer- ence on Learning Theory, pages 3-24, 2013.