arXiv Analytics

Sign in

arXiv:1301.1722 [cs.LG]AbstractReferencesReviewsResources

Linear Bandits in High Dimension and Recommendation Systems

Yash Deshpande, Andrea Montanari

Published 2013-01-08Version 1

A large number of online services provide automated recommendations to help users to navigate through a large collection of items. New items (products, videos, songs, advertisements) are suggested on the basis of the user's past history and --when available-- her demographic profile. Recommendations have to satisfy the dual goal of helping the user to explore the space of available items, while allowing the system to probe the user's preferences. We model this trade-off using linearly parametrized multi-armed bandits, propose a policy and prove upper and lower bounds on the cumulative "reward" that coincide up to constants in the data poor (high-dimensional) regime. Prior work on linear bandits has focused on the data rich (low-dimensional) regime and used cumulative "risk" as the figure of merit. For this data rich regime, we provide a simple modification for our policy that achieves near-optimal risk performance under more restrictive assumptions on the geometry of the problem. We test (a variation of) the scheme used for establishing achievability on the Netflix and MovieLens datasets and obtain good agreement with the qualitative predictions of the theory we develop.

Related articles: Most relevant | Search more
arXiv:1810.06313 [cs.LG] (Published 2018-10-15)
Regret vs. Bandwidth Trade-off for Recommendation Systems
arXiv:1110.4322 [cs.LG] (Published 2011-10-19, updated 2012-02-14)
An Optimal Algorithm for Linear Bandits
arXiv:1811.12591 [cs.LG] (Published 2018-11-30)
Active Learning in Recommendation Systems with Multi-level User Preferences