arXiv:1803.04623 Abstract | arXiv Analytics

arXiv:1803.04623 [cs.LG]Abstract References Reviews Resources

Thompson Sampling for Combinatorial Semi-Bandits

Published 2018-03-13Version 1

We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distribution-dependent regret bound of $O(m\log T / \Delta_{\min}) $ for TS under general CMAB, where $m$ is the number of arms, $T$ is the time horizon, and $\Delta_{\min}$ is the minimum gap between the expected reward of the optimal solution and any non-optimal solution. We also show that one can not use an approximate oracle in TS algorithm for even MAB problems. Then we expand the analysis to matroid bandit, a special case of CMAB. Finally, we use some experiments to show the comparison of regrets of CUCB and CTS algorithms.

Categories: cs.LG

Keywords: thompson sampling, combinatorial semi-bandits, general cmab, first distribution-dependent regret bound, stochastic combinatorial multi-armed bandit

Related articles: Most relevant | Search more

arXiv:1706.00977 [cs.LG] (Published 2017-06-03)

Thompson Sampling for the MNL-Bandit

Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

arXiv:2006.06372 [cs.LG] (Published 2020-06-11)

TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Jackie Baek, Vivek F. Farias

arXiv:2001.02323 [cs.LG] (Published 2020-01-08)

On Thompson Sampling for Smoother-than-Lipschitz Bandits

James A. Grant, David S. Leslie