arXiv:2106.03498 Abstract | arXiv Analytics

arXiv:2106.03498 [cs.LG]Abstract References Reviews Resources

Identifiability in inverse reinforcement learning

Haoyang Cao, Samuel N. Cohen, Lukasz Szpruch

Published 2021-06-07Version 1

Inverse reinforcement learning attempts to reconstruct the reward function in a Markov decision problem, using observations of agent actions. As already observed by Russell the problem is ill-posed, and the reward function is not identifiable, even under the presence of perfect information about optimal behavior. We provide a resolution to this non-identifiability for problems with entropy regularization. For a given environment, we fully characterize the reward functions leading to a given policy and demonstrate that, given demonstrations of actions for the same reward under two distinct discount factors, or under sufficiently different environments, the unobserved reward can be recovered up to a constant. Through a simple numerical experiment, we demonstrate the accurate reconstruction of the reward function through our proposed resolution.

Categories: cs.LG, math.OC

Subjects: 49N45, 93B30, 93E12, 93B15, 49N10, 90C40, 60J10, 62M05

Keywords: reward function, identifiability, inverse reinforcement learning attempts, distinct discount factors, markov decision problem

Related articles: Most relevant | Search more

arXiv:2002.02794 [cs.LG] (Published 2020-02-07)

Reward-Free Exploration for Reinforcement Learning

Chi Jin, Akshay Krishnamurthy, Max Simchowitz, Tiancheng Yu

arXiv:2107.09370 [cs.LG] (Published 2021-07-20)

An Embedding of ReLU Networks and an Analysis of their Identifiability

Pierre Stock, Rémi Gribonval

arXiv:2206.00238 [cs.LG] (Published 2022-06-01)

Transferable Reward Learning by Dynamics-Agnostic Discriminator Ensemble