Multi-facet Contextual Bandits
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining
https://doi.org/10.1145/3447548.3467299Abstract
Contextual multi-armed bandit has shown to be an effective tool in recommender systems. In this paper, we study a novel problem of multi-facet bandits involving a group of bandits, each characterizing the users' needs from one unique aspect. In each round, for the given user, we need to select one arm from each bandit, such that the combination of all arms maximizes the final reward. This problem can find immediate applications in E-commerce, healthcare, etc. To address this problem, we propose a novel algorithm, named MuFasa, which utilizes an assembled neural network to jointly learn the underlying reward functions of multiple bandits. It estimates an Upper Confidence Bound (UCB) linked with the expected reward to balance between exploitation and exploration. Under mild assumptions, we provide the regret analysis of Mu-Fasa. It can achieve the near-optimal O ((+ 1) √) regret bound where is the number of bandits and is the number of played rounds. Furthermore, we conduct extensive experiments to show that MuFasa outperforms strong baselines on real-world data sets.
References (47)
- Yasin Abbasi-Yadkori, Dávid Pál, and Csaba Szepesvári. 2011. Improved algo- rithms for linear stochastic bandits. In Advances in Neural Information Processing Systems. 2312-2320.
- Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International Conference on Machine Learning. PMLR, 127-135.
- Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. 2019. A convergence theory for deep learning via over-parameterization. In International Conference on Machine Learning. PMLR, 242-252.
- Robin Allesiardo, Raphaël Féraud, and Djallel Bouneffouf. 2014. A neural net- works committee for the contextual bandit problem. In International Conference on Neural Information Processing. Springer, 374-381.
- Sanjeev Arora, Simon S Du, Wei Hu, Zhiyuan Li, Russ R Salakhutdinov, and Ruosong Wang. 2019. On exact computation with an infinitely wide neural net. In Advances in Neural Information Processing Systems. 8141-8150.
- Jean-Yves Audibert and Sébastien Bubeck. 2010. Best arm identification in multi- armed bandits. In Conference on Learning Theory (COLT). 41-53.
- Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 2-3 (2002), 235-256.
- Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM journal on computing 32, 1 (2002), 48-77.
- Kamyar Azizzadenesheli, Emma Brunskill, and Animashree Anandkumar. 2018. Efficient exploration through bayesian deep q-networks. In 2018 Information Theory and Applications Workshop (ITA). IEEE, 1-9.
- Yikun Ban and Jingrui He. 2020. Generic Outlier Detection in Multi-Armed Bandit. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 913-923.
- Yikun Ban and Jingrui He. 2021. Local Clustering in Contextual Multi-Armed Bandits. arXiv preprint arXiv:2103.00063 (2021).
- Hamsa Bastani and Mohsen Bayati. 2020. Online decision making with high- dimensional covariates. Operations Research 68, 1 (2020), 276-294.
- Sébastien Bubeck, Rémi Munos, Gilles Stoltz, and Csaba Szepesvári. 2011. X- Armed Bandits. Journal of Machine Learning Research 12, 5 (2011).
- Swapna Buccapatnam, Atilla Eryilmaz, and Ness B Shroff. 2013. Multi-armed bandits in the presence of side observations in social networks. In 52nd IEEE Conference on Decision and Control. IEEE, 7309-7314.
- Yuan Cao and Quanquan Gu. 2019. Generalization bounds of stochastic gradient descent for wide and deep neural networks. In Advances in Neural Information Processing Systems. 10836-10846.
- Wei Chen, Yajun Wang, and Yang Yuan. 2013. Combinatorial multi-armed bandit: General framework and applications. In International Conference on Machine Learning. PMLR, 151-159.
- Wei Chen, Yajun Wang, Yang Yuan, and Qinshi Wang. 2016. Combinatorial multi-armed bandit and its extension to probabilistically triggered arms. The Journal of Machine Learning Research 17, 1 (2016), 1746-1778.
- Aniket Anand Deshmukh, Urun Dogan, and Clay Scott. 2017. Multi-task learning for contextual bandits. In Advances in neural information processing systems. 4848-4856.
- Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, and Guido Imbens. 2019. Balanced linear contextual bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3445-3453.
- Audrey Durand, Charis Achilleos, Demetris Iacovides, Katerina Strati, Georgios D Mitsis, and Joelle Pineau. 2018. Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis. In Machine Learning for Healthcare Conference. 67-82.
- Dongqi Fu, Zhe Xu, Bo Li, Hanghang Tong, and Jingrui He. 2020. A View- Adversarial Framework for Multi-View Network Embedding. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management. 2025-2028.
- Claudio Gentile, Shuai Li, Purushottam Kar, Alexandros Karatzoglou, Giovanni Zappella, and Evans Etrue. 2017. On context-dependent clustering of bandits. In Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 1253-1262.
- Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online clustering of bandits. In International Conference on Machine Learning. 757-765.
- Arthur Jacot, Franck Gabriel, and Clément Hongler. 2018. Neural tangent ker- nel: Convergence and generalization in neural networks. In Advances in neural information processing systems. 8571-8580.
- Baoyu Jing, Chanyoung Park, and Hanghang Tong. 2021. HDMI: High-order Deep Multiplex Infomax. arXiv preprint arXiv:2102.07810 (2021).
- John Langford and Tong Zhang. 2008. The epoch-greedy algorithm for multi- armed bandits with side information. In Advances in neural information processing systems. 817-824.
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient- based learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278- 2324.
- Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual- bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661-670.
- Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative filtering bandits. In Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. 539-548.
- Shuai Li, Baoxiang Wang, Shengyu Zhang, and Wei Chen. 2016. Contextual combinatorial cascading bandits. In International conference on machine learning. PMLR, 1245-1253.
- Zachary Lipton, Xiujun Li, Jianfeng Gao, Lihong Li, Faisal Ahmed, and Li Deng. 2018. Bbq-networks: Efficient exploration in deep reinforcement learning for task-oriented dialogue systems. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 32.
- Haoyang Liu, Keqin Liu, and Qing Zhao. 2011. Logarithmic weak regret of non- bayesian restless multi-armed bandit. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 1968-1971.
- Lijing Qin, Shouyuan Chen, and Xiaoyan Zhu. 2014. Contextual combinatorial bandit and its application on diversified online recommendation. In Proceedings of the 2014 SIAM International Conference on Data Mining. SIAM, 461-469.
- Carlos Riquelme, George Tucker, and Jasper Snoek. 2018. Deep bayesian bandits showdown: An empirical comparison of bayesian deep networks for thompson sampling. arXiv preprint arXiv:1802.09127 (2018).
- Sam T Roweis and Lawrence K Saul. 2000. Nonlinear dimensionality reduction by locally linear embedding. science 290, 5500 (2000), 2323-2326.
- Niranjan Srinivas, Andreas Krause, Sham M Kakade, and Matthias Seeger. 2009. Gaussian process optimization in the bandit setting: No regret and experimental design. arXiv preprint arXiv:0912.3995 (2009).
- Xiaoyuan Su and Taghi M Khoshgoftaar. 2009. A survey of collaborative filtering techniques. Advances in artificial intelligence 2009 (2009).
- William R Thompson. 1933. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 3/4 (1933), 285-294.
- Michal Valko, Nathaniel Korda, Rémi Munos, Ilias Flaounas, and Nelo Cristian- ini. 2013. Finite-time analysis of kernelised contextual bandits. arXiv preprint arXiv:1309.6869 (2013).
- Qingyun Wu, Huazheng Wang, Quanquan Gu, and Hongning Wang. 2016. Con- textual bandits in a collaborative environment. In Proceedings of the 39th Interna- tional ACM SIGIR conference on Research and Development in Information Retrieval. 529-538.
- Lin Yang and Mengdi Wang. 2020. Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound. In International Conference on Machine Learning. PMLR, 10746-10756.
- Tom Zahavy and Shie Mannor. 2019. Deep neural linear bandits: Overcoming cat- astrophic forgetting through likelihood matching. arXiv preprint arXiv:1901.08612 (2019).
- Lecheng Zheng, Yu Cheng, Hongxia Yang, Nan Cao, and Jingrui He. 2021. Deep Co-Attention Network for Multi-View Subspace Learning. arXiv preprint arXiv:2102.07751 (2021).
- Dawei Zhou, Jingrui He, K Selçuk Candan, and Hasan Davulcu. 2015. MUVIR: Multi-View Rare Category Detection.. In IJCAI. Citeseer, 4098-4104.
- Dongruo Zhou, Lihong Li, and Quanquan Gu. 2020. Neural contextual bandits with UCB-based exploration. In International Conference on Machine Learning. PMLR, 11492-11502.
- Dawei Zhou, Lecheng Zheng, Yada Zhu, Jianbo Li, and Jingrui He. 2020. Domain adaptive multi-modality neural attention network for financial forecasting. In Proceedings of The Web Conference 2020. 2230-2240.
- Yao Zhou, Jianpeng Xu, Jun Wu, Zeinab Taghavi Nasrabadi, Evren Korpeoglu, Kannan Achan, and Jingrui He. 2020. GAN-based Recommendation with Positive- Unlabeled Sampling. arXiv preprint arXiv:2012.06901 (2020).