arXiv:2102.06548 Abstract | arXiv Analytics

arXiv:2102.06548 [stat.ML]Abstract References Reviews Resources

Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning

Gen Li, Changxiao Cai, Yuxin Chen, Yuantao Gu, Yuting Wei, Yuejie Chi

Published 2021-02-12Version 1

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made recently towards understanding the sample efficiency of Q-learning. To yield an entrywise $\varepsilon$-accurate estimate of the optimal Q-function, state-of-the-art theory requires at least an order of $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^5\varepsilon^{2}}$ samples for a $\gamma$-discounted infinite-horizon MDP with state space $\mathcal{S}$ and action space $\mathcal{A}$. In this work, we sharpen the sample complexity of synchronous Q-learning to an order of $\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^4\varepsilon^2}$ (up to some logarithmic factor) for any $0<\varepsilon <1$, leading to an order-wise improvement in terms of the effective horizon $\frac{1}{1-\gamma}$. Analogous results are derived for finite-horizon MDPs as well. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage. A key ingredient of our analysis lies in the establishment of novel error decompositions and recursions, which might shed light on how to analyze finite-sample performance of other Q-learning variants.

Categories: stat.ML, cs.IT, cs.LG, math.IT, math.OC, math.ST, stat.TH

Keywords: sample complexity, q-learning, optimal q-function, dependence, analyze finite-sample performance

Related articles: Most relevant | Search more

arXiv:2106.07148 [stat.ML] (Published 2021-06-14)

On the Sample Complexity of Learning with Geometric Stability

Alberto Bietti, Luca Venturi, Joan Bruna

arXiv:2106.07898 [stat.ML] (Published 2021-06-15)

Divergence Frontiers for Generative Models: Sample Complexity, Quantization Level, and Frontier Integral

Lang Liu, Krishna Pillutla, Sean Welleck, Sewoong Oh, Yejin Choi, Zaid Harchaoui

arXiv:1011.5395 [stat.ML] (Published 2010-11-24)

The Sample Complexity of Dictionary Learning

Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein