arXiv:2202.13603 Abstract | arXiv Analytics

arXiv:2202.13603 [cs.LG]Abstract References Reviews Resources

Bandit Learning with General Function Classes: Heteroscedastic Noise and Variance-dependent Regret Bounds

Heyang Zhao, Dongruo Zhou, Jiafan He, Quanquan Gu

Published 2022-02-28Version 1

We consider learning a stochastic bandit model, where the reward function belongs to a general class of uniformly bounded functions, and the additive noise can be heteroscedastic. Our model captures contextual linear bandits and generalized linear bandits as special cases. While previous works (Kirschner and Krause, 2018; Zhou et al., 2021) based on weighted ridge regression can deal with linear bandits with heteroscedastic noise, they are not directly applicable to our general model due to the curse of nonlinearity. In order to tackle this problem, we propose a multi-level learning framework for the general bandit model. The core idea of our framework is to partition the observed data into different levels according to the variance of their respective reward and perform online learning at each level collaboratively. Under our framework, we first design an algorithm that constructs the variance-aware confidence set based on empirical risk minimization and prove a variance-dependent regret bound. For generalized linear bandits, we further propose an algorithm based on follow-the-regularized-leader (FTRL) subroutine and online-to-confidence-set conversion, which can achieve a tighter variance-dependent regret under certain conditions.

Comments: 33 pages, 1 table

Categories: cs.LG, math.OC, stat.ML

Keywords: variance-dependent regret bound, general function classes, heteroscedastic noise, bandit learning, model captures contextual linear bandits

Related articles: Most relevant | Search more

arXiv:1802.05693 [cs.LG] (Published 2018-02-15)

Bandit Learning with Positive Externalities

Virag Shah, Jose Blanchet, Ramesh Johari

arXiv:2210.11834 [cs.LG] (Published 2022-10-21)

Optimal Contextual Bandits with Knapsacks under Realizibility via Regression Oracles

Yuxuan Han, Jialin Zeng, Yang Wang, Yang Xiang, Jiheng Zhang

arXiv:1907.01287 [cs.LG] (Published 2019-07-02)

Bandit Learning Through Biased Maximum Likelihood Estimation