arXiv Analytics

Sign in

arXiv:1506.03379 [cs.LG]AbstractReferencesReviewsResources

The Online Discovery Problem and Its Application to Lifelong Reinforcement Learning

Emma Brunskill, Lihong Li

Published 2015-06-10Version 1

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning. Despite much encouraging empirical evidence that shows benefits of transfer, there has been very little theoretical analysis. In this paper, we study a class of lifelong reinforcement-learning problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action spaces and different transition/reward functions. Inspired by the need for cross-task exploration in lifelong learning, we formulate a novel online discovery problem and give an optimal learning algorithm to solve it. Such results allow us to develop a new lifelong reinforcement-learning algorithm, whose overall sample complexity in a sequence of tasks is much smaller than that of single-task learning, with high probability, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in a simulated problem.

Related articles: Most relevant | Search more
arXiv:1502.04469 [cs.LG] (Published 2015-02-16)
Classification and its application to drug-target interaction prediction
arXiv:1109.5078 [cs.LG] (Published 2011-09-23)
Application of distances between terms for flat and hierarchical data
arXiv:1302.6937 [cs.LG] (Published 2013-02-27, updated 2014-06-10)
Online Convex Optimization Against Adversaries with Memory and Application to Statistical Arbitrage