{
  "id": "1206.6461",
  "version": "v1",
  "published": "2012-06-27T19:59:59.000Z",
  "updated": "2012-06-27T19:59:59.000Z",
  "title": "On the Sample Complexity of Reinforcement Learning with a Generative Model",
  "authors": [
    "Mohammad Gheshlaghi Azar",
    "Remi Munos",
    "Bert Kappen"
  ],
  "comment": "Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)",
  "categories": [
    "cs.LG",
    "stat.ML"
  ],
  "abstract": "We consider the problem of learning the optimal action-value function in the discounted-reward Markov decision processes (MDPs). We prove a new PAC bound on the sample-complexity of model-based value iteration algorithm in the presence of the generative model, which indicates that for an MDP with N state-action pairs and the discount factor \\gamma\\in[0,1) only O(N\\log(N/\\delta)/((1-\\gamma)^3\\epsilon^2)) samples are required to find an \\epsilon-optimal estimation of the action-value function with the probability 1-\\delta. We also prove a matching lower bound of \\Theta (N\\log(N/\\delta)/((1-\\gamma)^3\\epsilon^2)) on the sample complexity of estimating the optimal action-value function by every RL algorithm. To the best of our knowledge, this is the first matching result on the sample complexity of estimating the optimal (action-) value function in which the upper bound matches the lower bound of RL in terms of N, \\epsilon, \\delta and 1/(1-\\gamma). Also, both our lower bound and our upper bound significantly improve on the state-of-the-art in terms of 1/(1-\\gamma).",
  "revisions": [
    {
      "version": "v1",
      "updated": "2012-06-27T19:59:59.000Z"
    }
  ],
  "analyses": {
    "keywords": [
      "sample complexity",
      "generative model",
      "reinforcement learning",
      "lower bound",
      "optimal action-value function"
    ],
    "tags": [
      "conference paper"
    ],
    "note": {
      "typesetting": "TeX",
      "pages": 0,
      "language": "en",
      "license": "arXiv",
      "status": "editable",
      "adsabs": "2012arXiv1206.6461G"
    }
  }
}