Multi-Armed Bandits and Its Application in Wireless Networks

Speaker:  Zhu Han – Houston, TX, United States
Topic(s):  Applied Computing


In recent years, multi-armed bandit (MAB), which can effectively tradeoff the well-known exploitation and exploration (EE) dilemma in online sequential decision problems, has gained an increasing attention in the recommend systems, clinic trial, economic, and communications due to the development of storage capability and computing power in devices. Compared with the traditional machine learning (ML) techniques, MAB has rigorous theoretical analysis (i.e., the regret bound), low computational complexity, and easy to apply. This talk supposes to establish an overall framework of MAB problems in the literature, consisting of the bandit models, classical algorithms, regret bounds, and applications. Specifically, we first introduce the background of the MAB and define different types of MAB models, such as stochastic bandits, adversary bandits, Markovian bandits, contextual bandits, Lipschitz bandits, and the recent advance in multi-player bandits. Then, we present the pseudocode of some classical algorithms (i.e., UCB, Thompson sampling, EXP3, EXP4, LinUCB, and zooming algorithm) to each type of MAB model. Meanwhile, we analyze the corresponding regret bounds of the above algorithms and unify them for comparison. In addition, we give five examples in wireless communications to better understand how MAB can apply to various applications. Finally, we conclude the main features and limitations of MABs.

About this Lecture

Number of Slides:  50
Duration:  60 minutes
Languages Available:  Chinese (Simplified), English
Last Updated: 

Request this Lecture

To request this particular lecture, please complete this online form.

Request a Tour

To request a tour with this speaker, please complete this online form.

All requests will be sent to ACM headquarters for review.