arXiv:2006.06372 Abstract | arXiv Analytics

arXiv:2006.06372 [cs.LG]Abstract References Reviews Resources

TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Published 2020-06-11Version 1

Thompson sampling has become a ubiquitous approach to online decision problems with bandit feedback. The key algorithmic task for Thompson sampling is drawing a sample from the posterior of the optimal action. We propose an alternative arm selection rule we dub TS-UCB, that requires negligible additional computational effort but provides significant performance improvements relative to Thompson sampling. At each step, TS-UCB computes a score for each arm using two ingredients: posterior sample(s) and upper confidence bounds. TS-UCB can be used in any setting where these two quantities are available, and it is flexible in the number of posterior samples it takes as input. This proves particularly valuable in heuristics for deep contextual bandits: we show that TS-UCB achieves materially lower regret on all problem instances in a deep bandit suite proposed in Riquelme et al. (2018). Finally, from a theoretical perspective, we establish optimal regret guarantees for TS-UCB for both the K-armed and linear bandit models.

Categories: cs.LG, stat.ML

Keywords: thompson sampling, additional computation, posterior sample, ts-ucb achieves materially lower regret, significant performance improvements

Related articles: Most relevant | Search more

arXiv:2001.02323 [cs.LG] (Published 2020-01-08)

On Thompson Sampling for Smoother-than-Lipschitz Bandits

James A. Grant, David S. Leslie

arXiv:1811.04471 [cs.LG] (Published 2018-11-11)

Thompson Sampling for Pursuit-Evasion Problems

Zhen Li, Nicholas J. Meyer, Eric B. Laber, Robert Brigantic

arXiv:1708.04781 [cs.LG] (Published 2017-08-16)

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Yichi Zhou, Jun Zhu, Jingwei Zhuo