arXiv Analytics

Sign in

arXiv:2112.01506 [cs.LG]AbstractReferencesReviewsResources

Sample Complexity of Robust Reinforcement Learning with a Generative Model

Kishan Panaganti, Dileep Kalathil

Published 2021-12-02, updated 2022-05-14Version 3

The Robust Markov Decision Process (RMDP) framework focuses on designing control policies that are robust against the parameter uncertainties due to the mismatches between the simulator model and real-world settings. An RMDP problem is typically formulated as a max-min problem, where the objective is to find the policy that maximizes the value function for the worst possible model that lies in an uncertainty set around a nominal model. The standard robust dynamic programming approach requires the knowledge of the nominal model for computing the optimal robust policy. In this work, we propose a model-based reinforcement learning (RL) algorithm for learning an $\epsilon$-optimal robust policy when the nominal model is unknown. We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence. For each of these uncertainty sets, we give a precise characterization of the sample complexity of our proposed algorithm. In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies. Finally, we demonstrate the performance of our algorithm on two benchmark problems.

Comments: Published in the International Conference on Artificial Intelligence and Statistics (AISTATS) 2022
Categories: cs.LG, stat.ML
Related articles: Most relevant | Search more
arXiv:1206.6461 [cs.LG] (Published 2012-06-27)
On the Sample Complexity of Reinforcement Learning with a Generative Model
arXiv:2007.00722 [cs.LG] (Published 2020-07-01)
Sequential Transfer in Reinforcement Learning with a Generative Model
arXiv:2003.11399 [cs.LG] (Published 2020-03-25)
Discriminative Viewer Identification using Generative Models of Eye Gaze