arXiv:1706.00977 Abstract | arXiv Analytics

arXiv:1706.00977 [cs.LG]Abstract References Reviews Resources

Thompson Sampling for the MNL-Bandit

Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi

Published 2017-06-03Version 1

We consider a sequential subset selection problem under parameter uncertainty, where at each time step, the decision maker selects a subset of cardinality $K$ from $N$ possible items (arms), and observes a (bandit) feedback in the form of the index of one of the items in said subset, or none. Each item in the index set is ascribed a certain value (reward), and the feedback is governed by a Multinomial Logit (MNL) choice model whose parameters are a priori unknown. The objective of the decision maker is to maximize the expected cumulative rewards over a finite horizon $T$, or alternatively, minimize the regret relative to an oracle that knows the MNL parameters. We refer to this as the MNL-Bandit problem. This problem is representative of a larger family of exploration-exploitation problems that involve a combinatorial objective, and arise in several important application domains. We present an approach to adapt Thompson Sampling to this problem and show that it achieves near-optimal regret as well as attractive numerical performance.

Comments: Accepted for presentation at Conference on Learning Theory (COLT) 2017

Categories: cs.LG

Keywords: thompson sampling, mnl-bandit, sequential subset selection problem, achieves near-optimal regret, decision maker selects

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1708.04781 [cs.LG] (Published 2017-08-16)

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Yichi Zhou, Jun Zhu, Jingwei Zhuo

arXiv:2007.00187 [cs.LG] (Published 2020-07-01)

Variable Selection via Thompson Sampling

Yi Liu, Veronika Rockova

arXiv:2306.17693 [cs.LG] (Published 2023-06-30)

Thompson sampling for improved exploration in GFlowNets

Jarrid Rector-Brooks et al.

arXiv Analytics

arXiv:1706.00977 [cs.LG]Abstract References Reviews Resources

Thompson Sampling for the MNL-Bandit

Links

Toolbox

arXiv:1706.00977 [cs.LG]AbstractReferencesReviewsResources

Thompson Sampling for the MNL-Bandit

Links

Toolbox

arXiv:1706.00977 [cs.LG]Abstract References Reviews Resources