arXiv:2010.07916 Abstract | arXiv Analytics

arXiv:2010.07916 [cs.AI]Abstract References Reviews Resources

Multi-Agent Trust Region Policy Optimization

Published 2020-10-15Version 1

We extend trust region policy optimization (TRPO) to multi-agent reinforcement learning (MARL) problems. We show that the policy update of TRPO can be transformed into a distributed consensus optimization problem for multi-agent cases. By making a series of approximations to the consensus optimization model, we propose a decentralized MARL algorithm, which we call multi-agent TRPO (MATRPO). This algorithm can optimize distributed policies based on local observations and private rewards. The agents do not need to know observations, rewards, policies or value/action-value functions of other agents. The agents only share a likelihood ratio with their neighbors during the training process. The algorithm is fully decentralized and privacy-preserving. Our experiments on two cooperative games demonstrate its robust performance on complicated MARL tasks.

Categories: cs.AI, cs.MA

Keywords: multi-agent trust region policy optimization, extend trust region policy optimization, consensus optimization model, consensus optimization problem

arXiv Analytics

arXiv:2010.07916 [cs.AI]Abstract References Reviews Resources

Multi-Agent Trust Region Policy Optimization

Links

Toolbox

arXiv:2010.07916 [cs.AI]AbstractReferencesReviewsResources

Multi-Agent Trust Region Policy Optimization

Links

Toolbox

arXiv:2010.07916 [cs.AI]Abstract References Reviews Resources