arXiv Analytics

Sign in

arXiv:1507.04208 [cs.LG]AbstractReferencesReviewsResources

Combinatorial Cascading Bandits

Branislav Kveton, Zheng Wen, Azin Ashkan, Csaba Szepesvari

Published 2015-07-15Version 1

We consider learning to maximize reward in combinatorial cascading bandits, a new learning setting that unifies cascading and combinatorial bandits. The unification of these frameworks presents unique challenges in the analysis but allows for modeling a rich set of partial monitoring problems, such as learning to route in a communication network to minimize the probability of losing routed packets and recommending diverse items. We propose CombCascade, a computationally-efficient UCB-like algorithm for solving our problem; and derive gap-dependent and gap-free upper bounds on its regret. Our analysis builds on recent results in stochastic combinatorial semi-bandits but also addresses two novel challenges of our learning setting, a non-linear objective and partial observability. We evaluate CombCascade on two real-world problems and demonstrate that it performs well even when our modeling assumptions are violated. We also demonstrate that our setting requires new learning algorithms.

Related articles: Most relevant | Search more
arXiv:2006.15987 [cs.LG] (Published 2020-06-29)
Robustifying Sequential Neural Processes
arXiv:2207.10081 [cs.LG] (Published 2022-07-20)
What Do We Maximize in Self-Supervised Learning?
arXiv:1907.06010 [cs.LG] (Published 2019-07-13)
The Futility of Bias-Free Learning and Search