arXiv:1502.02763 Abstract | arXiv Analytics

arXiv:1502.02763 [cs.LG]Abstract References Reviews Resources

Cascading Bandits

Branislav Kveton, Csaba Szepesvari, Zheng Wen, Azin Ashkan

Published 2015-02-10Version 1

The cascade model is a well-established model of user interaction with content. In this work, we propose cascading bandits, a learning variant of the model where the objective is to learn $K$ most attractive items out of $L$ ground items. We cast the problem as a stochastic combinatorial bandit with a non-linear reward function and partially observed weights of items. Both of these are challenging in the context of combinatorial bandits. We propose two computationally-efficient algorithms for our problem, CascadeUCB1 and CascadeKL-UCB, and prove gap-dependent upper bounds on their regret. We also derive a lower bound for cascading bandits and show that it matches the upper bound of CascadeKL-UCB up to a logarithmic factor. Finally, we evaluate our algorithms on synthetic problems. Our experiments demonstrate that the algorithms perform well and robustly even when our modeling assumptions are violated.

Categories: cs.LG, stat.ML

Keywords: cascading bandits, stochastic combinatorial bandit, gap-dependent upper bounds, cascadekl-ucb, non-linear reward function

Related articles:

arXiv:1603.05359 [cs.LG] (Published 2016-03-17)

Cascading Bandits for Large-Scale Recommendation Problems

Shi Zong, Hao Ni, Kenny Sung, Nan Rosemary Ke, Zheng Wen, Branislav Kveton

arXiv:2001.08655 [cs.LG] (Published 2020-01-23)

Best Arm Identification for Cascading Bandits in the Fixed Confidence Setting

Zixin Zhong, Wang Chi Cheung, Vincent Y. F. Tan