muMAB: A Multi-Armed Bandit Model for Wireless Network Selection
2018, Algorithms
https://doi.org/10.3390/A11020013Abstract
Multi-armed bandit (MAB) models are a viable approach to describe the problem of best wireless network selection by a multi-Radio Access Technology (multi-RAT) device, with the goal of maximizing the quality perceived by the final user. The classical MAB model does not allow, however, to properly describe the problem of wireless network selection by a multi-RAT device, in which a device typically performs a set of measurements in order to collect information on available networks, before a selection takes place. The MAB model foresees in fact only one possible action for the player, which is the selection of one among different arms at each time step; existing arm selection algorithms thus mainly differ in the rule according to which a specific arm is selected. This work proposes a new MAB model, named measure-use-MAB (muMAB), aiming at providing a higher flexibility, and thus a better accuracy in describing the network selection problem. The muMAB model extends the classical MAB model in a twofold manner; first, it foresees two different actions: to measure and to use; second, it allows actions to span over multiple time steps. Two new algorithms designed to take advantage of the higher flexibility provided by the muMAB model are also introduced. The first one, referred to as measure-use-UCB1 (muUCB1) is derived from the well known UCB1 algorithm, while the second one, referred to as Measure with Logarithmic Interval (MLI), is appositely designed for the new model so to take advantage of the new measure action, while aggressively using the best arm. The new algorithms are compared against existing ones from the literature in the context of the muMAB model, by means of computer simulations using both synthetic and captured data. Results show that the performance of the algorithms heavily depends on the Probability Density Function (PDF) of the reward received on each arm, with different algorithms leading to the best performance depending on the PDF. Results highlight, however, that as the ratio between the time required for using an arm and the time required to measure increases, the proposed algorithms guarantee the best performance, with muUCB1 emerging as the best candidate when the arms are characterized by similar mean rewards, and MLI prevailing when an arm is significantly more rewarding than others. This calls thus for the introduction of an adaptive approach capable of adjusting the behavior of the algorithm or of switching algorithm altogether, depending on the acquired knowledge on the PDF of the reward on each arm.
References (27)
- 5G: A Technology Vision. 4 November 2013. Available online: http://www.huawei.com/5gwhitepaper/ (accessed on 24 January 2018).
- Matinmikko, M.; Roivainen, A.; Latva-aho, M.; Hiltunen, K. Interference Study of Micro Licensing for 5G Micro Operator Small Cell Deployments. In Proceedings of the 12th EAI International Conference on Cognitive Radio Oriented Wireless Networks (CROWNCOM), Lisbon, Portugal, 20-22 September 2017.
- Trestian, R.; Ormond, O.; Muntean, G.M. Game Theory-Based Network Selection: Solutions and Challenges. IEEE Commun. Surv. Tutor. 2012, 14, 1212-1231.
- Wang, L.; Kuo, G.S. Mathematical Modeling for Network Selection in Heterogeneous Wireless Networks-A Tutorial. IEEE Commun. Surv. Tutor. 2013, 15, 271-292.
- Lee, W.; Cho, D.H. Enhanced Group Handover Scheme in Multiaccess Networks. IEEE Trans. Veh. Technol. 2011, 60, 2389-2395.
- Farrugia, R.A.; Galea, C.; Zammit, S.; Muscat, A. Objective Video Quality Metrics for HDTV Services: A Survey. EuroCon 2013, 2013, doi:10.1109/eurocon.2013.6624982.
- Di Benedetto, M.G.; Cattoni, A.F.; Fiorina, J.; Bader, F.; De Nardis, L. Cognitive radio and Networking for Heterogeneous Wireless Networks. In Automatic Best Wireless Network Selection Based on Key Performance Indicators; Boldrini, S., Di Benedetto, M.G., Tosti, A., Fiorina, J., Eds.; Signals and Communication Technology; Springer: Berlin, Germany, 2015; Chapter by Boldrini, pp. 201-214.
- Tsiropoulou, E.E.; Katsinis, G.K.; Filios, A.; Papavassiliou, S. On the Problem of Optimal Cell Selection and Uplink Power Control in Open Access Multi-service Two-Tier Femtocell Networks. In Proceedings of the 13th International Conference on Ad-Hoc Networks and Wireless (ADHOC-NOW 2014), Benidorm, Spain, 22-27 June 2014; Springer: Berlin, Germany, 2014; Volume 8487.
- Vamvakas, P.; Tsiropoulou, E.E.; Papavassiliou, S. Dynamic provider selection and power resource management in competitive wireless communication markets. Mob. Netw. Appl. 2017, 1-14, doi:10.1007/s11036-017-0885-y.
- Malanchini, I.; Cesana, M.; Gatti, N. Network Selection and Resource Allocation Games for Wireless Access Networks. IEEE Trans. Mobile Comput. 2013, 12, 2427-2440.
- Yang, Y.H.; Chen, Y.; Jiang, C.; Wang, C.Y.; Ray Liu, K.J. Wireless Access Network Selection Game with Negative Network Externality. IEEE Trans. Wirel. Commun. 2013, 12, 5048-5060.
- Whittle, P. Multi-armed bandits and the Gittins index. J. R. Stat. Soc. Ser. B 1980, 42, 143-149.
- Gittins, J.C. Multi-Armed Bandit Allocation Indices; John Wiley & Sons: Hoboken, NJ, USA, 1989.
- Hero, A.; Castanon, D.; Cochran, D.; Kastella, K. (Eds.) Multi-Armed Bandit Problems. In Foundations and Applications of Sensor Management; Springer International Publishing AG: Cham, Switzerland, 2008.
- Caso, G.; De Nardis, L.; Di Benedetto, M.G. Toward Context-Aware Dynamic Spectrum Management for 5G. IEEE Wirel. Commun. 2017, 24, 38-43.
- Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 2002, 47, 235-256.
- Vermorel, J.; Mohri, M. Multi-armed bandit algorithms and empirical evaluation. In Proceedings of the 16th European Conference on Machine Learning, Porto, Portugal, 3-7 October 2005; Springer International Publishing AG: Cham, Switzerland, 2005; Volume 3720, pp. 437-448.
- Agarwal, A.; Hsu, D.; Kale, S.; Langford, J.; Li, L.; Schapire, R.E. Taming the monster: a fast and simple algorithm for contextual bandits. In Proceedings of the 31st International Conference on Machine Learning, Beijing, China, 21-26 June 2014; pp. II-1638-II-1646.
- Wu, Q.; Du, Z.; Yang, P.; Yao, Y.D.; Wang, J. Traffic-Aware Online Network Selection in Heterogeneous Wireless Networks. IEEE Trans. Veh. Technol. 2016, 65, 381-397.
- Lai, T.L.; Robbins, H. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 1985, 6, 4-22.
- Hassan, H.; Elkhazeen, K.; Raahemiafar, K.; Fernando, X. Optimization of control parameters using averaging of handover indicator and received power for minimizing ping-pong handover in LTE. In Proceedings of the IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), Halifax, NS, Canada, 3-6 May 2015.
- Cesa-Bianchi, N.; Fischer, P. Finite-time regret bounds of the multi-armed bandit problem. In Proceedings of the 15th International Conference on Machine Learning (ICML 1998), Madison, WI, USA, 24-27 July 1998; pp. 100-108.
- Watkins, C.J.C.H. Learning from Delayed Rewards. Ph.D. Thesis, Cambridge University, Cambridge, UK, May 1989.
- Vermorel, J. Multi-Armed Bandit Data. 2013. Available online: https://sourceforge.net/projects/bandit/ (accessed on 24 January 2018).
- Lai, L.; El Gamal, H.; Jiang, H.; Poor, H.V. Cognitive medium access: Exploration, exploitation, and competition. IEEE Trans. Mobile Comput. 2011, 10, 239-253.
- Mu, M.; Mauthe, A.; Garcia, F. A utility-based QoS model for emerging multimedia applications. In Proceedings of the 2nd International Conference on Next Generation Mobile Applications, Services and Technologies (NGMAST'08), Cardiff, UK, 16-19 September 2008.
- Boldrini, S.; Fiorina, J.; Di Benedetto, M.G. Introducing strategic measure actions in multi-armed bandits. In Proceedings of the IEEE 24th International Symposium on Personal, Indoor and Mobile Radio Communications-Workshop on Cognitive Radio Medium Access Control and Network Solutions (MACNET'13), London, UK, 8-9 September 2013.