Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

Jakob Foerster; Nantas Nardelli; Gregory Farquhar; Triantafyllos Afouras; Philip H. S. Torr; Pushmeet Kohli; Shimon Whiteson

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

【24h】

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

机译：稳定的体验重播，以进行深度的多智能体强化学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent’s value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.

机译：许多实际问题，例如网络数据包路由和城市交通控制，自然被建模为多智能体强化学习（RL）问题。但是，现有的多主体RL方法通常在问题大小上扩展性很差。因此，关键的挑战是将深度学习在单主体RL上的成功转化为多主体设置。一个主要的绊脚石是，独立的Q学习是最流行的多智能体RL方法，引入了非平稳性，这使其与深度Q学习所依赖的体验重播内存不兼容。本文提出了两种解决此问题的方法：1）使用重要性抽样的多主体变量来自然衰减过时的数据，以及2）在指纹上调节每个主体的值函数，以消除从重播内存中采样的数据的年龄。在具有挑战性的StarCraft单位微管理分散式变体上的结果证实，这些方法能够将经验重播与多代理RL成功结合。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2017年第3期|共10页
作者
Jakob Foerster; Nantas Nardelli; Gregory Farquhar; Triantafyllos Afouras; Philip H. S. Torr; Pushmeet Kohli; Shimon Whiteson;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Learning multi-agent communication with double attentional deep reinforcement learning [J] . Hangyu Mao, Zhengchao Zhang, Zhen Xiao, Autonomous agents and multi-agent systems . 2020,第1期

机译：学习多智能经纪人沟通与双重预付深度加强学习
2. When Does Communication Learning Need Hierarchical Multi-Agent Deep Reinforcement Learning [J] . Marie Ossenkopf, Mackenzie Jorgensen, Kurt Geihs Cybernetics and Systems . 2019,第5a8期

机译：沟通学习何时需要分层多功能深度加强学习
3. Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations [J] . Skrynnik Alexey, Staroverov Aleksey, Aitygulov Ermek, Knowledge-Based Systems . 2021,第Apra22期

机译：从专家演示中的分层强化学习中的健忘体验重播
4. Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning [C] . Jakob Foerster, Nantas Nardelli, Gregory Farquhar, International Conference on Machine Learning . 2018

机译：稳定体验重播深层多智能经纪增强学习
5. Entropy-Based Experience Replay in Reinforcement Learning [D] . Dadvar, Mehdi. 2020

机译：基于熵的体验重播在加固学习中
6. Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor–Critic with Hindsight Experience Replay [O] . Evan Prianto, MyeongSeop Kim, Jae-Han Park, 2020

机译：使用深度加强学习的多臂操纵器的路径规划：软演员 - 与后敏感体验重播
7. Deep Reinforcement Learning With Quantum-Inspired Experience Replay [O] . Qing Wei, Hailan Ma, Chunlin Chen, 2021

机译：随着量子启发体验重放的深度增强学习
8. Enhanced Experience Replay for Deep Reinforcement Learning. [R] . Doria, D., Dawson, B., Vindiola, M. 2015

机译：增强深度强化学习的体验重播。

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅