{ "id": "2102.06548", "version": "v1", "published": "2021-02-12T14:22:05.000Z", "updated": "2021-02-12T14:22:05.000Z", "title": "Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning", "authors": [ "Gen Li", "Changxiao Cai", "Yuxin Chen", "Yuantao Gu", "Yuting Wei", "Yuejie Chi" ], "categories": [ "stat.ML", "cs.IT", "cs.LG", "math.IT", "math.OC", "math.ST", "stat.TH" ], "abstract": "Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made recently towards understanding the sample efficiency of Q-learning. To yield an entrywise $\\varepsilon$-accurate estimate of the optimal Q-function, state-of-the-art theory requires at least an order of $\\frac{|\\mathcal{S}||\\mathcal{A}|}{(1-\\gamma)^5\\varepsilon^{2}}$ samples for a $\\gamma$-discounted infinite-horizon MDP with state space $\\mathcal{S}$ and action space $\\mathcal{A}$. In this work, we sharpen the sample complexity of synchronous Q-learning to an order of $\\frac{|\\mathcal{S}||\\mathcal{A}|}{(1-\\gamma)^4\\varepsilon^2}$ (up to some logarithmic factor) for any $0<\\varepsilon <1$, leading to an order-wise improvement in terms of the effective horizon $\\frac{1}{1-\\gamma}$. Analogous results are derived for finite-horizon MDPs as well. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage. A key ingredient of our analysis lies in the establishment of novel error decompositions and recursions, which might shed light on how to analyze finite-sample performance of other Q-learning variants.", "revisions": [ { "version": "v1", "updated": "2021-02-12T14:22:05.000Z" } ], "analyses": { "keywords": [ "sample complexity", "q-learning", "optimal q-function", "dependence", "analyze finite-sample performance" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }