arXiv:1806.02501 Abstract | arXiv Analytics

arXiv:1806.02501 [cs.RO]Abstract References Reviews Resources

Simplifying Reward Design through Divide-and-Conquer

Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan

Published 2018-06-07Version 1

Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be challenging and frustrating. The reward needs to work across multiple different environments, and that often requires many iterations of tuning. We introduce a novel divide-and-conquer approach that enables the designer to specify a reward separately for each environment. By treating these separate reward functions as observations about the underlying true reward, we derive an approach to infer a common reward across all environments. We conduct user studies in an abstract grid world domain and in a motion planning domain for a 7-DOF manipulator that measure user effort and solution quality. We show that our method is faster, easier to use, and produces a higher quality solution than the typical method of designing a reward jointly across all environments. We additionally conduct a series of experiments that measure the sensitivity of these results to different properties of the reward design task, such as the number of environments, the number of feasible solutions per environment, and the fraction of the total features that vary within each environment. We find that independent reward design outperforms the standard, joint, reward design process but works best when the design problem can be divided into simpler subproblems.

Comments: Robotics: Science and Systems (RSS) 2018

Categories: cs.RO, cs.AI, cs.LG

Keywords: simplifying reward design, environment, abstract grid world domain, reward function, higher quality solution

Related articles: Most relevant | Search more

arXiv:1909.11337 [cs.RO] (Published 2019-09-25)

OCTNet: Trajectory Generation in New Environments from Past Experiences

Weiming Zhi, Tin Lai, Lionel Ott, Gilad Francis, Fabio Ramos

arXiv:2002.05700 [cs.RO] (Published 2020-02-13)

BADGR: An Autonomous Self-Supervised Learning-Based Navigation System

Gregory Kahn, Pieter Abbeel, Sergey Levine

arXiv:1909.09295 [cs.RO] (Published 2019-09-20)

Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation