首页> 外文会议>International Conference on Machine Learning >Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
【24h】

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

机译:稳定体验重播深层多智能经纪增强学习

获取原文

摘要

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
机译:许多现实世界问题,如网络数据包路由和城市流量控制,自然地被建模为多功能增强学习(RL)问题。然而,现有的多代理RL方法通常在问题大小中规模不佳。因此,关键挑战是将单王子RL的深度学习的成功转化为多功能代理设置。一个主要的绊脚石是独立的Q-Learning,最受欢迎的多代理RL方法,介绍了不符合深度Q-Learning依赖的重播记忆不相容的非运动性。本文提出了解决此问题的两种方法:1)使用重要性采样的多代理变量对自然衰减过时的数据,2)将每个代理的价值函数调节在歧视从重放内存中采样的数据的年龄的指纹上。结果挑战性分散变体的星形争霸单位微管理证实,这些方法使得能够使用多代理RL成功组合体验重放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号