arXiv:1301.1722 Abstract | arXiv Analytics

arXiv:1301.1722 [cs.LG]Abstract References Reviews Resources

Linear Bandits in High Dimension and Recommendation Systems

Published 2013-01-08Version 1

A large number of online services provide automated recommendations to help users to navigate through a large collection of items. New items (products, videos, songs, advertisements) are suggested on the basis of the user's past history and --when available-- her demographic profile. Recommendations have to satisfy the dual goal of helping the user to explore the space of available items, while allowing the system to probe the user's preferences. We model this trade-off using linearly parametrized multi-armed bandits, propose a policy and prove upper and lower bounds on the cumulative "reward" that coincide up to constants in the data poor (high-dimensional) regime. Prior work on linear bandits has focused on the data rich (low-dimensional) regime and used cumulative "risk" as the figure of merit. For this data rich regime, we provide a simple modification for our policy that achieves near-optimal risk performance under more restrictive assumptions on the geometry of the problem. We test (a variation of) the scheme used for establishing achievability on the Netflix and MovieLens datasets and obtain good agreement with the qualitative predictions of the theory we develop.

Comments: 21 pages, 4 figures

Categories: cs.LG, stat.ML

Keywords: linear bandits, high dimension, recommendation systems, achieves near-optimal risk performance, users past history

Related articles: Most relevant | Search more

arXiv:1810.06313 [cs.LG] (Published 2018-10-15)

Regret vs. Bandwidth Trade-off for Recommendation Systems

Linqi Song, Christina Fragouli, Devavrat Shah

arXiv:1110.4322 [cs.LG] (Published 2011-10-19, updated 2012-02-14)

An Optimal Algorithm for Linear Bandits

Nicolò Cesa-Bianchi, Sham Kakade

arXiv:1811.12591 [cs.LG] (Published 2018-11-30)

Active Learning in Recommendation Systems with Multi-level User Preferences