arXiv:1206.6461 Abstract | arXiv Analytics

arXiv:1206.6461 [cs.LG]Abstract References Reviews Resources

On the Sample Complexity of Reinforcement Learning with a Generative Model

Mohammad Gheshlaghi Azar, Remi Munos, Bert Kappen

Published 2012-06-27Version 1

We consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor \gamma\in[0,1) only O(N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) samples are required to find an \epsilon-optimal estimation of the action-value function with the probability 1-\delta. We also prove a matching lower bound of \Theta (N\log(N/\delta)/((1-\gamma)^3\epsilon^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower bound of RL in terms of N, \epsilon, \delta and 1/(1-\gamma). Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1-\gamma).

Comments: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Categories: cs.LG, stat.ML

Keywords: sample complexity, generative model, reinforcement learning, lower bound, optimal action-value function

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:2204.08573 [cs.LG] (Published 2022-04-18)

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville Kyrki, Danica Kragic, Mårten Björkman

arXiv:2112.01506 [cs.LG] (Published 2021-12-02, updated 2022-05-14)

Sample Complexity of Robust Reinforcement Learning with a Generative Model

Kishan Panaganti, Dileep Kalathil

arXiv:2007.00722 [cs.LG] (Published 2020-07-01)