Academia.eduAcademia.edu

Outline

PAC-Bayesian Analysis of Martingales and Multiarmed Bandits

Abstract

We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinf...

References (28)

  1. Peter Auer and Ronald Ortner. UCB revisited: Improved regret bounds for the stochastic multi- armed bandit problem. Periodica Mathematica Hungarica, 61(1-2):55-65, 2010.
  2. Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 2002a.
  3. Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multi- armed bandit problem. SIAM Journal of Computing, 32(1), 2002b.
  4. Kazuoki Azuma. Weighted sums of certain dependent random variables. Tôhoku Mathematical Journal, 19(3), 1967.
  5. Arindam Banerjee. On Bayesian bounds. In Proceedings of the International Conference on Machine Learning (ICML), 2006.
  6. Alina Beygelzimer, John Langford, Lihong Li, Lev Reyzin, and Robert E. Schapire. Contextual bandit algorithms with supervised learning guarantees. http://arxiv.org/abs/1002.4058, 2010.
  7. Nicolò Cesa-Bianchi and Gábor Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
  8. Thomas M. Cover and Joy A. Thomas. Elements of Information Theory. John Wiley & Sons, 1991.
  9. Monroe D. Donsker and S.R. Srinivasa Varadhan. Asymptotic evaluation of certain Markov process expectations for large time. Communications on Pure and Applied Mathematics, 28, 1975.
  10. Paul Dupuis and Richard S. Ellis. A Weak Convergence Approach to the Theory of Large Deviations. Wiley-Interscience, 1997.
  11. Mahdi Milani Fard and Joelle Pineau. PAC-Bayesian model selection for reinforcement learning. In Advances in Neural Information Processing Systems (NIPS), 2010.
  12. Pascal Germain, Alexandre Lacasse, François Laviolette, and Mario Marchand. PAC-Bayesian learn- ing of linear classifiers. In Proceedings of the International Conference on Machine Learning (ICML), 2009.
  13. Robert M. Gray. Entropy and Information Theory. Springer, 2 edition, 2011.
  14. Matthew Higgs and John Shawe-Taylor. A PAC-Bayes bound for tailored density estimation. In Proceedings of the International Conference on Algorithmic Learning Theory (ALT), 2010.
  15. W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the Amer- ican Statistical Association, 58(301):13-30, 1963.
  16. John Langford and John Shawe-Taylor. PAC-Bayes & margins. In Advances in Neural Information Processing Systems (NIPS), 2002.
  17. Guy Lever, François Laviolette, and John Shawe-Taylor. Distribution-dependent PAC-Bayes priors. In Proceedings of the International Conference on Algorithmic Learning Theory (ALT), 2010.
  18. Andreas Maurer. A note on the PAC-Bayesian theorem. www.arxiv.org, 2004.
  19. David McAllester. Some PAC-Bayesian theorems. In Proceedings of the International Conference on Computational Learning Theory (COLT), 1998.
  20. David McAllester. Simplified PAC-Bayesian margin bounds. In Proceedings of the International Conference on Computational Learning Theory (COLT), 2003.
  21. David McAllester. Generalization bounds and consistency for structured labeling. In Gökhan Bakir, Thomas Hofmann, Bernhard Schölkopf, Alexander Smola, Ben Taskar, and S.V.N. Vishwanathan, editors, Predicting Structured Data. The MIT Press, 2007.
  22. Liva Ralaivola, Marie Szafranski, and Guillaume Stempfel. Chromatic PAC-Bayes bounds for non- IID data: Applications to ranking and stationary β-mixing processes. Journal of Machine Learning Research, 2010.
  23. Matthias Seeger. PAC-Bayesian generalization error bounds for Gaussian process classification. Journal of Machine Learning Research, 2002.
  24. Matthias Seeger. Bayesian Gaussian Process Models: PAC-Bayesian Generalization Error Bounds and Sparse Approximations. PhD thesis, University of Edinburgh, 2003.
  25. Yevgeny Seldin and Naftali Tishby. PAC-Bayesian analysis of co-clustering and beyond. Journal of Machine Learning Research, 11, 2010.
  26. John Shawe-Taylor and Robert C. Williamson. A PAC analysis of a Bayesian estimator. In Pro- ceedings of the International Conference on Computational Learning Theory (COLT), 1997.
  27. John Shawe-Taylor, Peter L. Bartlett, Robert C. Williamson, and Martin Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5), 1998.
  28. Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998. Naftali Tishby and Daniel Polani. Information theory of decisions and actions. In Vassilis Cutsuridis, Amir Hussain, John G. Taylor, and Daniel Polani, editors, Perception-Reason-Action Cycle: Mod- els, Algorithms and Systems. Springer, 2010.