arXiv:1810.03024 Abstract | arXiv Analytics

arXiv:1810.03024 [cs.LG]Abstract References Reviews Resources

Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

Published 2018-10-06Version 1

We introduce algorithms that achieve state-of-the-art \emph{dynamic regret} bounds for non-stationary linear stochastic bandits setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the (possibly adversarial) non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining $d,B_T,$ and $T$ as the problem dimension, the \emph{variation budget}, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (\texttt{SW-UCB}) algorithm with optimal $\widetilde{O}(d^{2/3}(B_T+1)^{1/3}T^{2/3})$ dynamic regret, and the tuning free bandits-over-bandits (\texttt{BOB}) framework built on top of the \texttt{SW-UCB} algorithm with best $\widetilde{O}(d^{2/3}(B_T+1)^{1/4}T^{3/4})$ dynamic regret.

Categories: cs.LG, stat.ML

Keywords: non-stationarity, dynamic regret, non-stationary linear stochastic bandits setting, captures natural applications, total time horizon

Related articles: Most relevant | Search more

arXiv:2006.05876 [cs.LG] (Published 2020-06-10)

Improved Analysis for Dynamic Regret of Strongly Convex and Smooth Functions

Peng Zhao, Lijun Zhang

arXiv:1903.01461 [cs.LG] (Published 2019-03-04)

Hedging the Drift: Learning to Optimize under Non-Stationarity

Wang Chi Cheung, David Simchi-Levi, Ruihao Zhu

arXiv:1608.03933 [cs.LG] (Published 2016-08-13)

Improved dynamic regret for non-degeneracy functions

Lijun Zhang, Tianbao Yang, Jinfeng Yi, Rong Jin, Zhi-Hua Zhou