On the Application Multi-Armed and Contextual Bandits
2020
Abstract
The multi-armed bandit field is currently experiencing a renaissance, as novel problem settings and algorithms motivated by various practical applications are being introduced, building on top of the classical bandit problem. This article aims to provide a comprehensive review of top recent developments in multiple real-life applications of the multi-armed bandit. Specifically, we introduce a taxonomy of common MAB-based applications and summarize the state-of-the-art for each of those domains. Furthermore, we identify important current trends and provide new perspectives pertaining to the future of this burgeoning field.
References (123)
- A. Durand, C. Achilleos, D. Iacovides, K. Strati, G. D. Mitsis, and J. Pineau, "Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis," in Machine Learning for Healthcare Conference, pp. 67-82, 2018.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "Hybrid- ε-greedy for mobile context-aware recommender system," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 468-479, Springer, Berlin, Heidelberg, 2012.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "Follow- ing the user's interests in mobile context-aware recommender systems: The hybrid-e-greedy algorithm," in 2012 26th Inter- national Conference on Advanced Information Networking and Applications Workshops, pp. 657-662, IEEE, 2012.
- R. Allesiardo, R. Féraud, and D. Bouneffouf, "A neural net- works committee for the contextual bandit problem," in Interna- tional Conference on Neural Information Processing, pp. 374- 381, Springer, Cham, 2014.
- D. Bouneffouf, "Situation-aware approach to improve context- based recommender system," arXiv preprint arXiv:1303.0481, 2013.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "Consid- ering the high level critical situations in con-text-aware recom- mender systems," in 2nd International Workshop on Information Management for Mobile Applications, p. 26, 2012.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "Ex- ploration/exploitation trade-off in mobile context-aware recom- mender systems," in Australasian Joint Conference on Artificial Intelligence, pp. 591-601, Springer, Berlin, Heidelberg, 2012.
- D. Bouneffouf, "Applying machine learning techniques to improve user acceptance on ubiquitous environement," arXiv preprint arXiv:1301.4351, 2013.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganarski, "Risk- aware recommender systems," in International Conference on Neural Information Processing, pp. 57-65, Springer, Berlin, Heidelberg, 2013.
- D. Bouneffouf, "Role of temporal inference in the recognition of textual inference," arXiv preprint arXiv:1302.5645, 2013.
- D. Bouneffouf, DRARS, a dynamic risk-aware recommender system. PhD thesis, 2013.
- D. Bouneffouf, "Improving adaptation of ubiquitous recomman- der systems by using reinforcement learning and collaborative filtering," arXiv preprint arXiv:1303.2308, 2013.
- D. Bouneffouf, R. Laroche, T. Urvoy, R. Féraud, and R. Alle- siardo, "Contextual bandit for active learning: Active thompson sampling," in International Conference on Neural Information Processing, pp. 405-412, Springer, Cham, 2014.
- D. Bouneffouf, "Towards user profile modelling in recom- mender system," arXiv preprint arXiv:1305.1114, 2013.
- D. Bouneffouf, Role de l'inference temporelle dans la recon- naissance de l'inference textuelle. PhD thesis, Université des Sciences et de la Technologie, 2008.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "Contex- tual bandits for context-based information retrieval," in Interna- tional Conference on Neural Information Processing, pp. 35-42, Springer, Berlin, Heidelberg, 2013.
- D. Bouneffouf, "The impact of situation clustering in contextual-bandit algorithm for context-aware recommender systems," arXiv preprint arXiv:1304.3845, 2013.
- D. Bouneffouf, "Recommandation mobile, sensible au contexte de contenus\'evolutifs: Contextuel-e-greedy," arXiv preprint arXiv:1402.1986, 2014.
- D. Bouneffouf and I. Birol, "Sampling with minimum sum of squared similarities for nystrom-based large scale spectral clustering," in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- D. Bouneffouf, "Hybrid q-learning applied to ubiquitous rec- ommender system," arXiv preprint arXiv:1303.2651, 2013.
- D. Bouneffouf, "Mobile recommender systems methods: An overview," arXiv preprint arXiv:1305.1745, 2013.
- D. Bouneffouf, "Exponentiated gradient exploration for active learning," Computers, vol. 5, no. 1, p. 1, 2016.
- D. Bouneffouf, "Evolution of the user's content: An overview of the state of the art," arXiv preprint arXiv:1305.1787, 2013.
- D. Bouneffouf, "" l'apprentissage automatique", une étape importante dans l'adaptation des systèmes d'information à l'utilisateur," 2013.
- D. Bouneffouf, "Context-based information retrieval in risky environment," arXiv preprint arXiv:1409.7729, 2014.
- D. Bouneffouf, "\'etude des dimensions sp\'ecifiques du con- texte dans un syst\eme de filtrage d'informations," arXiv preprint arXiv:1405.6287, 2014.
- D. Bouneffouf, "Freshness-aware thompson sampling," in In- ternational Conference on Neural Information Processing, pp. 373-380, Springer, Cham, 2014.
- R. Allesiardo, R. Féraud, and D. Bouneffouf, "Prise de décision contextuelle en bande organisée: Quand les bandits font un brainstorming," 2014.
- D. Bouneffouf, "Temporal logic and its applications in natural language processing," 2011.
- D. Bouneffouf, "R-ucb: a contextual bandit algorithm for risk- aware recommender systems," arXiv preprint arXiv:1408.2195, 2014.
- D. Bouneffouf, "Proposition d'une technique de gestion de projet dans les startups," arXiv preprint arXiv:1303.2317, 2013.
- D. Bouneffouf, "Learning and inference engine applied to ubiquitous recommender system,"
- D. Bouneffouf, "La logique temporelle et ses applications dans le traitement du langage naturel," 2013.
- D. Bouneffouf, "Ant clustering to improve situation-aware rec- ommender systems," 2014.
- D. Bouneffouf, "Contextual bandit algorithm for risk-aware recommender systems," in 2016 IEEE Congress on Evolutionary Computation (CEC), pp. 4667-4674, IEEE, 2016.
- D. Bouneffouf and R. Féraud, "Multi-armed bandit problem with known trend," Neurocomputing, vol. 205, pp. 16-21, 2016.
- D. Bouneffouf, "Exponentiated gradient linucb for contextual multi-armed bandits," arXiv preprint arXiv:1305.2415, 2013.
- D. Bouneffouf and I. Birol, "Theoretical analysis of the min- imum sum of squared similarities sampling for nyström-based spectral clustering," in 2016 International Joint Conference on Neural Networks (IJCNN), pp. 3856-3862, IEEE, 2016.
- D. Bouneffouf, "Temporal logic in natural language processing," 2013.
- D. Bouneffouf, "Optimizing an utility function for explo- ration/exploitation trade-off in context-aware recommender sys- tem," arXiv preprint arXiv:1303.0485, 2013.
- D. Bouneffouf and I. Birol, "Ensemble minimum sum of squared similarities sampling for nyström-based spectral clus- tering," in 2016 International Joint Conference on Neural Net- works (IJCNN), pp. 3851-3855, IEEE, 2016.
- D. Bouneffouf, I. Rish, G. A. Cecchi, and R. Féraud, "Context attentive bandits: contextual bandit with restricted context," in IJCAI 2017, 2017.
- D. Bouneffouf, I. Rish, and G. A. Cecchi, "Bandit models of human behavior: Reward processing in mental disorders," in International Conference on Artificial General Intelligence, pp. 237-248, Springer, Cham, 2017.
- D. Bouneffouf, "Drars: un système de recommandation dy- namique sensible au risque,"
- B. Lin, G. Cecchi, D. Bouneffouf, and I. Rish, "Adaptive representation selection in contextual bandit with unlabeled history," 2018.
- D. Bouneffouf, "Nystrom sampling depends on the eigenspec- trum shape of the data," 2018.
- A. Balakrishnan, D. Bouneffouf, N. Mattei, and F. Rossi, "Using contextual bandits with behavioral constraints for constrained online movie recommendation.," in IJCAI, pp. 5802-5804, 2018.
- M. Riemer, T. Klinger, D. Bouneffouf, and M. Franceschini, "Scalable recollections for continual lifelong learning," in Pro- ceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1352-1359, 2019.
- A. Balakrishnan, D. Bouneffouf, N. Mattei, and F. Rossi, "Incorporating behavioral constraints in online ai systems," in AAAI, 2019.
- D. Bouneffouf, "Eigenspectrum shape based nyström sampling," in 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1-6, IEEE, 2018.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "Follow- ing the user's interests in mobile context-aware,"
- M. Riemer, M. Franceschini, D. Bouneffouf, and T. Klinger, "Generative knowledge distillation for general purpose function compression," in NIPS 2017 Workshop on Teaching Machines, Robots, and Humans, vol. 5, p. 30, 2017.
- B. Lin, D. Bouneffouf, G. A. Cecchi, and I. Rish, "Contex- tual bandit with adaptive feature extraction," in 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 937-944, IEEE, 2018.
- A. Choromanska, B. Cowen, S. Kumaravel, R. Luss, M. Rigotti, I. Rish, P. Diachille, V. Gurev, B. Kingsbury, R. Tejwani, et al., "Beyond backprop: Online alternating minimization with auxiliary variables," in International Conference on Machine Learning, pp. 1193-1202, 2019.
- D. Bouneffouf and I. Rish, "A survey on practical applications of multi-armed and contextual bandits," in The IEEE World Congress on Computational Intelligence (IEEE WCCI), 2020.
- B. Djallel, A. Bouzeghoub, and A. L. Ganarski, "Risk-aware recommender systems,"
- S. Upadhyay, M. Agarwal, D. Bounneffouf, and Y. Khazaeni, "A bandit approach to posterior dialog orchestration under a budget," NIPS 2018.
- S. Liu, P. Ram, D. Bouneffouf, G. Bramble, A. R. Conn, H. Samulowitz, and A. G. Gray, "Automated machine learning via admm," CoRR, abs/1905.00424, 2019.
- D. Bouneffouf, S. Parthasarathy, H. Samulowitz, and M. Wistub, "Optimal exploitation of clustering and history information in multi-armed bandit," arXiv preprint arXiv:1906.03979, 2019.
- R. Noothigattu, D. Bouneffouf, N. Mattei, R. Chandra, P. Madan, K. Varshney, M. Campbell, M. Singh, and F. Rossi, "Interpretable multi-objective reinforcement learning through policy orchestration," arXiv preprint arXiv:1809.08343, 2018.
- M. Yurochkin, S. Upadhyay, D. Bouneffouf, M. Agarwal, and Y. Khazaeni, "Online semi-supervised learning with bandit feedback," 2019.
- B. Lin, G. Cecchi, D. Bouneffouf, J. Reinen, and I. Rish, "Re- inforcement learning models of human behavior: Reward pro- cessing in mental disorders," arXiv preprint arXiv:1906.11286, 2019.
- R. Noothigattu, D. Bouneffouf, N. Mattei, R. Chandra, P. Madan, K. R. Varshney, M. Campbell, M. Singh, and F. Rossi, "Teaching ai agents ethical values using reinforcement learning and policy orchestration," IBM Journal of Research and Devel- opment, vol. 63, no. 4/5, pp. 2-1, 2019.
- C. Aggarwal, D. Bouneffouf, H. Samulowitz, B. Buesser, T. Hoang, U. Khurana, S. Liu, T. Pedapati, P. Ram, A. Rawat, et al., "How can ai automate end-to-end data science?," arXiv preprint arXiv:1910.14436, 2019.
- A. Balakrishnan, D. Bouneffouf, N. Mattei, and F. Rossi, "Using multi-armed bandits to learn ethical priorities for online ai systems," IBM Journal of Research and Development, vol. 63, no. 4/5, pp. 1-1, 2019.
- S. Mehta, F. Rossi, K. Varshney, A. Balakrishnan, D. Boun- effouf, N. Mattei, R. Noothigattu, R. Chandra, P. Madan, M. Campbell, et al., "Ai ethics," 2019.
- S. Liu, P. Ram, D. Vijaykeerthy, D. Bouneffouf, G. Bramble, H. Samulowitz, D. Wang, A. Conn, and A. G. Gray, "An admm based framework for automl pipeline configuration.," in AAAI, pp. 4892-4899, 2020.
- S. Sharma, Y. Zhang, J. M. Ríos Aliaga, D. Bouneffouf, V. Muthusamy, and K. R. Varshney, "Data augmentation for discrimination prevention and bias disambiguation," in Proceed- ings of the AAAI/ACM Conference on AI, Ethics, and Society, pp. 358-364, 2020.
- A. Balakrishnan, D. Bouneffouf, N. Mattei, and F. Rossi, "Constrained decision-making and explanation of a recommen- dation," Jan. 16 2020. US Patent App. 16/050,176.
- B. Lin, G. A. Cecchi, D. Bouneffouf, J. Reinen, and I. Rish, "A story of two streams: Reinforcement learning models from human behavior and neuropsychiatry.," in AAMAS, pp. 744-752, 2020.
- K. Varshney, M. Campbell, M. Singh, and F. Rossi, "Teaching ai agents ethical values using reinforcement learning and policy orchestration,"
- D. Bouneffouf and E. Claeys, "Hyper-parameter tuning for the contextual bandit," arXiv preprint arXiv:2005.02209, 2020.
- B. Lin, G. Cecchi, D. Bouneffouf, J. Reinen, and I. Rish, "Uni- fied models of human behavioral agents in bandits, contextual bandits and rl," arXiv preprint arXiv:2005.04544, 2020.
- D. Bouneffouf and E. Claeys, "Learning exploration for con- textual bandit," in AutoML@ ICML 2019: 6th ICML Workshop on Automated Machine Learning, 2016.
- B. Lin, D. Bouneffouf, and G. Cecchi, "Online learning in iterated prisoner's dilemma to mimic human behavior," arXiv preprint arXiv:2006.06580, 2020.
- P. Ram, S. Liu, D. Vijaykeerthi, D. Wang, D. Bounef- fouf, G. Bramble, H. Samulowitz, and A. G. Gray, "Solv- ing constrained cash problems with admm," arXiv preprint arXiv:2006.09635, 2020.
- D. Bouneffouf, "Online learning with corrupted context: Cor- rupted contextual bandits," arXiv preprint arXiv:2006.15194, 2020.
- D. Bouneffouf, S. Upadhyay, and Y. Khazaeni, "Contextual bandit with missing rewards," arXiv preprint arXiv:2007.06368, 2020.
- D. Bouneffouf, "Location-aware approach to improve context- based recommender system," arXiv preprint arXiv:1303.0481, 2013.
- K. Toutanova and H. Wu, "Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: Long papers)," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2014.
- C. S. Leung, Neural Information Processing: 19th International Conference, ICONIP 2012, Doha, Qatar, November 12-15, 2012, Proceedings, Part III. Springer, 2012.
- A. T. B. Jin, Neural Information Processing: 21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014: Proceedings. Springer, 2014.
- D. Bouneffouf, "Computing the dirichlet-multinomial log- likelihood function," arXiv preprint arXiv:2007.11967, 2020.
- D. Bouneffouf, "Spectral clustering using eigenspectrum shape based nystrom sampling," arXiv preprint arXiv:2007.11416, 2020.
- D. Bouneffouf, S. Upadhyay, and Y. Khazaeni, "Online learning from less data: Contextual bandit with missing rewards,"
- A. Gupta, Y. Ong, B. Da, L. Feng, and S. Handoko, "2016 ieee congress on evolutionary computation (cec)," 2016.
- D. Bouneffouf, A. Bouzeghoub, and A. L. Ganc ¸arski, "contextual-bandit algorithm for context-aware recommender system," in International conference on neural information processing, pp. 324-331, Springer, Berlin, Heidelberg, 2012.
- D. Bouneffouf, I. Rish, and G. A. Cecchi, "Bandit models of human behavior,"
- D. Bouneffouf, C. Aggarwal, H. Samulowitz, B. Buesser, T. Hoang, U. Khurana, S. Liu, T. Pedapati, P. Ram, A. Rawat, et al., "Survey on automated end-to-end data science?,"
- D. Bouneffouf, "Spectral clustering using eigenspectrum shape based sampling,"
- D. Bouneffouf and E. Claeys, "Online hyperparameter tuning for contextual bandits," 2018.
- D. Bouneffouf, "Toward computing the dirichlet-multinomial log-likelihood function,"
- K. Ding, J. Li, and H. Liu, "Interactive anomaly detection on attributed networks," in Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM '19, (New York, NY, USA), pp. 357-365, ACM, 2019.
- T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Advances in Applied Mathematics, vol. 6, no. 1, pp. 4-22, 1985.
- P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Machine Learning, vol. 47, no. 2-3, pp. 235-256, 2002.
- D. Bouneffouf and R. Féraud, "Multi-armed bandit problem with known trend," Neurocomputing, vol. 205, pp. 16-21, 2016.
- S. Agrawal and N. Goyal, "Analysis of thompson sampling for the multi-armed bandit problem," in COLT 2012 -The 25th Annual Conference on Learning Theory, June 25-27, 2012, Edinburgh, Scotland, pp. 39.1-39.26, 2012.
- J. Langford and T. Zhang, "The epoch-greedy algorithm for multi-armed bandits with side information," in Advances in neural information processing systems, pp. 817-824, 2008.
- S. Agrawal and N. Goyal, "Thompson sampling for contextual bandits with linear payoffs," in ICML (3), pp. 127-135, 2013.
- L. Li, W. Chu, J. Langford, and R. E. Schapire, "A contextual- bandit approach to personalized news article recommendation," CoRR, 2010.
- R. Allesiardo, R. Féraud, and D. Bouneffouf, "A neural net- works committee for the contextual bandit problem," in Neu- ral Information Processing -21st International Conference, ICONIP 2014, Kuching, Malaysia, November 3-6, 2014. Pro- ceedings, Part I, pp. 374-381, 2014.
- H. Bastani and M. Bayati, "Online decision-making with high- dimensional covariates," Available at SSRN 2661896, 2015.
- W. Shen, J. Wang, Y.-G. Jiang, and H. Zha, "Portfolio choices with orthogonal bandit learning," in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
- X. Huo and F. Fu, "Risk-aware multi-armed bandit problem with application to portfolio selection," Royal Society open science, vol. 4, no. 11, p. 171377, 2017.
- K. Misra, E. M. Schwartz, and J. Abernethy, "Dynamic online pricing with incomplete information using multi-armed bandit experiments," 2018.
- J. Mueller, V. Syrgkanis, and M. Taddy, "Low-rank bandit methods for high-dimensional dynamic pricing," arXiv preprint arXiv:1801.10242, 2018.
- Q. Zhou, X. Zhang, J. Xu, and B. Liang, "Large-scale bandit approaches for recommender systems," in International Confer- ence on Neural Information Processing, pp. 811-821, Springer, 2017.
- S. Vaswani, B. Kveton, Z. Wen, M. Ghavamzadeh, L. V. Lak- shmanan, and M. Schmidt, "Model-independent online learning for influence maximization," in Proceedings of the 34th Interna- tional Conference on Machine Learning-Volume 70, pp. 3530- 3539, JMLR. org, 2017.
- Z. Wen, B. Kveton, M. Valko, and S. Vaswani, "Online influence maximization under independent cascade model with semi- bandit feedback," in Advances in neural information processing systems, pp. 3022-3032, 2017.
- D. E. Losada, J. Parapar, and A. Barreiro, "Multi-armed ban- dits for adjudicating documents in pooling-based evaluation of information retrieval systems," Information Processing & Management, vol. 53, no. 5, pp. 1005-1025, 2017.
- B. Liu, T. Yu, I. Lane, and O. J. Mengshoel, "Customized nonlinear bandits for online response selection in neural con- versation models," in AAAI, 2018, pp. 5245-5252, 2018.
- T. Silander et al., "Contextual memory bandit for pro-active dialog engagement," 2018.
- S. Upadhyay, M. Agarwal, D. Bounneffouf, and Y. Khazaeni, "A bandit approach to posterior dialog orchestration under a budget," 2018.
- D. J. Soemers, T. Brys, K. Driessens, M. H. Winands, and A. Nowé, "Adapting to concept drift in credit card transaction data streams using contextual bandits and decision trees," in AAAI, 2018.
- S. Boldrini, L. De Nardis, G. Caso, M. Le, J. Fiorina, and M.- G. Di Benedetto, "mumab: A multi-armed bandit model for wireless network selection," Algorithms, vol. 11, no. 2, p. 13, 2018.
- R. Kerkouche, R. Alami, R. Féraud, N. Varsier, and P. Maillé, "Node-based optimization of lora transmissions with multi- armed bandit algorithms," in ICT 2018, Saint Malo, France, June 26-28, 2018, pp. 521-526, 2018.
- M. Gagliolo and J. Schmidhuber, "Algorithm selection as a bandit problem with unbounded losses," in Learning and In- telligent Optimization, 4th International Conference, LION 4, Venice, Italy, January 18-22, 2010. Selected Papers, pp. 82-96, 2010.
- L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Tal- walkar, "Hyperband: A novel bandit-based approach to hyperpa- rameter optimization," arXiv preprint arXiv:1603.06560, 2016.
- J. Wang, P. Zhao, S. C. Hoi, and R. Jin, "Online feature selection and its applications," IEEE Transactions on Knowledge and Data Engineering, vol. 26, no. 3, pp. 698-710, 2014.
- D. Bouneffouf, I. Rish, G. A. Cecchi, and R. Féraud, "Context attentive bandits: Contextual bandit with restricted context," in IJCAI 2017, Melbourne, Australia, August 19-25, 2017, pp. 1468-1475, 2017.
- R. Ganti and A. G. Gray, "Building bridges: Viewing active learning from the multi-armed bandit lens," arXiv preprint arXiv:1309.6830, 2013.
- J. Sublime and S. Lefebvre, "Collaborative clustering through constrained networks using bandit optimization," in 2018 Inter- national Joint Conference on Neural Networks, IJCNN 2018, Rio de Janeiro, Brazil, July 8-13, 2018, pp. 1-8, 2018.
- R. Laroche and R. Féraud, "Algorithm selection of off-policy reinforcement learning algorithm," CoRR, vol. abs/1701.08810, 2017.