arXiv:1110.4322 Abstract | arXiv Analytics

arXiv:1110.4322 [cs.LG]Abstract References Reviews Resources

An Optimal Algorithm for Linear Bandits

Published 2011-10-19, updated 2012-02-14Version 3

We provide the first algorithm for online bandit linear optimization whose regret after T rounds is of order sqrt{Td ln N} on any finite class X of N actions in d dimensions, and of order d*sqrt{T} (up to log factors) when X is infinite. These bounds are not improvable in general. The basic idea utilizes tools from convex geometry to construct what is essentially an optimal exploration basis. We also present an application to a model of linear bandits with expert advice. Interestingly, these results show that bandit linear optimization with expert advice in d dimensions is no more difficult (in terms of the achievable regret) than the online d-armed bandit problem with expert advice (where EXP4 is optimal).

Comments: This paper is superseded by S. Bubeck, N. Cesa-Bianchi, and S.M. Kakade, "Towards minimax policies for online linear optimization with bandit feedback"

Categories: cs.LG, stat.ML

Keywords: linear bandits, optimal algorithm, expert advice, online bandit linear optimization, basic idea utilizes tools

Related articles: Most relevant | Search more

arXiv:1409.3040 [cs.LG] (Published 2014-09-10)

Towards Optimal Algorithms for Prediction with Expert Advice

Nick Gravin, Yuval Peres, Balasubramanian Sivan

arXiv:2307.15154 [cs.LG] (Published 2023-07-27)

A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Zhihan Xiong, Romain Camilleri, Maryam Fazel, Lalit Jain, Kevin Jamieson

arXiv:1301.1722 [cs.LG] (Published 2013-01-08)

Linear Bandits in High Dimension and Recommendation Systems

Yash Deshpande, Andrea Montanari