arXiv Analytics

Sign in

arXiv:2202.13603 [cs.LG]AbstractReferencesReviewsResources

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

Published 2022-02-28Version 1

We consider learning a stochastic bandit model, where the reward function belongs to a general class of uniformly bounded functions, and the additive noise can be heteroscedastic. Our model captures contextual linear bandits and generalized linear bandits as special cases. While previous works (Kirschner and Krause, 2018; Zhou et al., 2021) based on weighted ridge regression can deal with linear bandits with heteroscedastic noise, they are not directly applicable to our general model due to the curse of nonlinearity. In order to tackle this problem, we propose a multi-level learning framework for the general bandit model. The core idea of our framework is to partition the observed data into different levels according to the variance of their respective reward and perform online learning at each level collaboratively. Under our framework, we first design an algorithm that constructs the variance-aware confidence set based on empirical risk minimization and prove a variance-dependent regret bound. For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.

Related articles: Most relevant | Search more
arXiv:1802.05693 [cs.LG] (Published 2018-02-15)
Bandit Learning with Positive Externalities
arXiv:2210.11834 [cs.LG] (Published 2022-10-21)
Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles
arXiv:1907.01287 [cs.LG] (Published 2019-07-02)
Bandit Learning Through Biased Maximum Likelihood Estimation