arXiv:2210.11692 Abstract | arXiv Analytics

arXiv:2210.11692 [cs.LG]Abstract References Reviews Resources

Competing Bandits in Time Varying Matching Markets

Deepan Muthirayan, Chinmay Maheshwari, Pramod P. Khargonekar, Shankar Sastry

Published 2022-10-21Version 1

We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying. We propose the {\it Restart Competing Bandits (RCB)} algorithm, which combines a simple {\it restart strategy} to handle the non-stationarity with the {\it competing bandits} algorithm \citep{liu2020competing} designed for the stationary case. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {$\widetilde{\mathcal{O}}(L^{1/2}_TT^{1/2})$} up to the number of changes in the underlying preference of agents, $L_T$. We also discuss extensions of this algorithm to the case where the number of changes need not be known a priori.

Categories: cs.LG, cs.GT, cs.MA

Keywords: time varying matching markets, competing bandits, preference, uniform sub-linear regret

Related articles: Most relevant | Search more

arXiv:1906.05363 [cs.LG] (Published 2019-06-12)

Competing Bandits in Matching Markets

Lydia T. Liu, Horia Mania, Michael I. Jordan

arXiv:2205.03699 [cs.LG] (Published 2022-05-07)

Rate-Optimal Contextual Online Matching Bandit

Yuantong Li, Chi-hua Wang, Guang Cheng, Will Wei Sun

arXiv:2403.11782 [cs.LG] (Published 2024-03-18, updated 2024-03-24)

A tutorial on learning from preferences and choices with Gaussian Processes

Alessio Benavoli, Dario Azzimonti