arXiv:1301.0601 Abstract | arXiv Analytics

arXiv:1301.0601 [cs.LG]Abstract References Reviews Resources

Reinforcement Learning with Partially Known World Dynamics

Published 2012-12-12Version 1

Reinforcement learning would enjoy better success on real-world problems if domain knowledge could be imparted to the algorithm by the modelers. Most problems have both hidden state and unknown dynamics. Partially observable Markov decision processes (POMDPs) allow for the modeling of both. Unfortunately, they do not provide a natural framework in which to specify knowledge about the domain dynamics. The designer must either admit to knowing nothing about the dynamics or completely specify the dynamics (thereby turning it into a planning problem). We propose a new framework called a partially known Markov decision process (PKMDP) which allows the designer to specify known dynamics while still leaving portions of the environment s dynamics unknown.The model represents NOT ONLY the environment dynamics but also the agents knowledge of the dynamics. We present a reinforcement learning algorithm for this model based on importance sampling. The algorithm incorporates planning based on the known dynamics and learning about the unknown dynamics. Our results clearly demonstrate the ability to add domain knowledge and the resulting benefits for learning.

Comments: Appears in Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence (UAI2002)

Categories: cs.LG, stat.ML

Keywords: reinforcement learning, world dynamics, unknown dynamics, partially observable markov decision processes, enjoy better success

Tags: conference paper

Related articles: Most relevant | Search more

arXiv:1609.07672 [cs.LG] (Published 2016-09-24)

Information-Theoretic Methods for Planning and Learning in Partially Observable Markov Decision Processes

Roy Fox

arXiv:1706.04711 [cs.LG] (Published 2017-06-15)

Reinforcement Learning under Model Mismatch

Aurko Roy, Huan Xu, Sebastian Pokutta

arXiv:2111.06784 [cs.LG] (Published 2021-11-12, updated 2022-03-02)

A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes

Chengchun Shi, Masatoshi Uehara, Jiawei Huang, Nan Jiang

arXiv Analytics

arXiv:1301.0601 [cs.LG]Abstract References Reviews Resources

Reinforcement Learning with Partially Known World Dynamics

Links

Toolbox

arXiv:1301.0601 [cs.LG]AbstractReferencesReviewsResources

Reinforcement Learning with Partially Known World Dynamics

Links

Toolbox

arXiv:1301.0601 [cs.LG]Abstract References Reviews Resources