首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
【24h】

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

机译:稳定的体验重播,以进行深度的多智能体强化学习

获取原文
       

摘要

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent’s value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
机译:许多实际问题,例如网络数据包路由和城市交通控制,自然被建模为多智能体强化学习(RL)问题。但是,现有的多主体RL方法通常在问题大小上扩展性很差。因此,关键的挑战是将深度学习在单主体RL上的成功转化为多主体设置。一个主要的绊脚石是,独立的Q学习是最流行的多智能体RL方法,引入了非平稳性,这使其与深度Q学习所依赖的体验重播内存不兼容。本文提出了两种解决此问题的方法:1)使用重要性抽样的多主体变量来自然衰减过时的数据,以及2)在指纹上调节每个主体的值函数,以消除从重播内存中采样的数据的年龄。在具有挑战性的StarCraft单位微管理分散式变体上的结果证实,这些方法能够将经验重播与多代理RL成功结合。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号