{ "id": "2112.01506", "version": "v3", "published": "2021-12-02T18:55:51.000Z", "updated": "2022-05-14T04:29:06.000Z", "title": "Sample Complexity of Robust Reinforcement Learning with a Generative Model", "authors": [ "Kishan Panaganti", "Dileep Kalathil" ], "comment": "Published in the International Conference on Artificial Intelligence and Statistics (AISTATS) 2022", "categories": [ "cs.LG", "stat.ML" ], "abstract": "The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\\epsilon$-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.", "revisions": [ { "version": "v3", "updated": "2022-05-14T04:29:06.000Z" } ], "analyses": { "keywords": [ "sample complexity", "robust reinforcement learning", "generative model", "nominal model", "optimal robust policy" ], "tags": [ "conference paper" ], "note": { "typesetting": "TeX", "pages": 0, "language": "en", "license": "arXiv", "status": "editable" } } }