arXiv:2211.16715 Abstract | arXiv Analytics

arXiv:2211.16715 [cs.LG]Abstract References Reviews Resources

Policy Optimization over General State and Action Spaces

Published 2022-11-30Version 1

Reinforcement learning (RL) problems over general state and action spaces are notoriously challenging. In contrast to the tableau setting, one can not enumerate all the states and then iteratively update the policies for each state. This prevents the application of many well-studied RL methods especially those with provable convergence guarantees. In this paper, we first present a substantial generalization of the recently developed policy mirror descent method to deal with general state and action spaces. We introduce new approaches to incorporate function approximation into this method, so that we do not need to use explicit policy parameterization at all. Moreover, we present a novel policy dual averaging method for which possibly simpler function approximation techniques can be applied. We establish linear convergence rate to global optimality or sublinear convergence to stationarity for these methods applied to solve different classes of RL problems under exact policy evaluation. We then define proper notions of the approximation errors for policy evaluation and investigate their impact on the convergence of these methods applied to general-state RL problems with either finite-action or continuous-action spaces. To the best of our knowledge, the development of these algorithmic frameworks as well as their convergence analysis appear to be new in the literature.

Categories: cs.LG, cs.AI, math.OC

Keywords: general state, action spaces, simpler function approximation techniques, policy mirror descent method, policy optimization

Related articles: Most relevant | Search more

arXiv:2212.07946 [cs.LG] (Published 2022-12-15)

Combining information-seeking exploration and reward maximization: Unified inference on continuous state and action spaces under partial observability

Parvin Malekzadeh, Konstantinos N. Plataniotis

arXiv:1905.00741 [cs.LG] (Published 2019-05-02)

From Video Game to Real Robot: The Transfer between Action Spaces

Janne Karttunen, Anssi Kanervisto, Ville Hautamäki, Ville Kyrki

arXiv:2312.10584 [cs.LG] (Published 2023-12-17)

Policy Optimization in RLHF: The Impact of Out-of-preference Data

Ziniu Li, Tian Xu, Yang Yu