Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

Menda Kunal; Chen Yi-Chun; Grana Justin; Bono James W.; Tracey Brendan D.; Kochenderfer Mykel J.; Wolpert David

首页> 外文期刊>IEEE Transactions on Intelligent Transportation Systems >Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

【24h】

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

机译：用于事件驱动的多主体决策过程的深度强化学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The incorporation of macro-actions (temporally extended actions) into multi-agent decision problems has the potential to address the curse of dimensionality associated with such decision problems. Since macro-actions last for stochastic durations, multiple agents executing decentralized policies in cooperative environments must act asynchronously. We present an algorithm that modifies generalized advantage estimation for temporally extended actions, allowing a state-of-the-art policy optimization algorithm to optimize policies in Dec-POMDPs in which agents act asynchronously. We show that our algorithm is capable of learning optimal policies in two cooperative domains, one involving real-time bus holding control and one involving wildfire fighting with unmanned aircraft. Our algorithm works by framing problems as "event-driven decision processes," which are scenarios in which the sequence and timing of actions and events are random and governed by an underlying stochastic process. In addition to optimizing policies with continuous state and action spaces, our algorithm also facilitates the use of event-driven simulators, which do not require time to be discretized into time-steps. We demonstrate the benefit of using event-driven simulation in the context of multiple agents taking asynchronous actions. We show that fixed time-step simulation risks obfuscating the sequence in which closely separated events occur, adversely affecting the policies learned. In addition, we show that arbitrarily shrinking the time-step scales poorly with the number of agents.

机译：将宏动作（临时扩展的动作）合并到多主体决策问题中具有解决与此类决策问题相关的维度诅咒的潜力。由于宏操作持续随机的时间，因此在协作环境中执行分散策略的多个代理必须异步操作。我们提出了一种算法，该算法修改了针对时间扩展动作的广义优势估计，从而允许最新的策略优化算法来优化代理异步执行的Dec-POMDP中的策略。我们证明了我们的算法能够在两个合作领域中学习最优策略，其中一个领域涉及实时公交车保持控制，另一个领域涉及与无人驾驶飞机进行野火战斗。我们的算法通过将问题定为“事件驱动的决策过程”来工作，在这些场景中，动作和事件的顺序和时间安排是随机的，并由潜在的随机过程控制。除了使用连续的状态和动作空间优化策略外，我们的算法还促进了事件驱动模拟器的使用，这些模拟器不需要将时间离散为时间步长。我们演示了在多个代理采取异步操作的情况下使用事件驱动的仿真的好处。我们表明，固定的时间步仿真可能会混淆发生紧密分离的事件的顺序，从而对所学习的策略产生不利影响。此外，我们表明随代理人数的增加，任意缩小时间步长比例很差。

著录项

来源
《IEEE Transactions on Intelligent Transportation Systems》 |2019年第4期|1259-1268|共10页
作者
Menda Kunal; Chen Yi-Chun; Grana Justin; Bono James W.; Tracey Brendan D.; Kochenderfer Mykel J.; Wolpert David;
展开▼
作者单位

Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA;

Univ Calif Los Angeles, Anderson Sch Management, Los Angeles, CA 90095 USA;

Santa Fe Inst, Santa Fe, NM 87501 USA;

Economists Inc, San Francisco, CA 94105 USA;

Santa Fe Inst, Santa Fe, NM 87501 USA;

Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA;

Santa Fe Inst, Resident Fac, Santa Fe, NM 87501 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Artificial intelligence; autonomous vehicles; discrete event simulation; distributed decision-making; neural networks; multi-agent systems;

机译：人工智能;自治车辆;离散事件仿真;分布式决策;神经网络;多功能系统;

相似文献

外文文献
中文文献
专利

1. Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes [J] . Menda Kunal, Chen Yi-Chun, Grana Justin, IEEE Transactions on Intelligent Transportation Systems . 2019,第4期

机译：对事件驱动的多代理决策过程进行深度增强学习
2. Partially decentralized reinforcement learning in finite, multi-agent Markov decision processes [J] . Omkar Tilak, Snehasis Mukhopadhyay AI communications . 2011,第4期

机译：有限多智能体马尔可夫决策过程中的部分分散强化学习
3. CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes [J] . Hiroshi OSADA, Satoshi FUJITA IEICE Transactions on Information and Systems . 2005,第5期

机译：CHQ：用于部分可观察的马尔可夫决策过程的多智能体强化学习方案
4. CHQ: a multi-agent reinforcement learning scheme for partially observable Markov decision processes [C] . Osada H., Fujita S. Intelligent Agent Technology, 2004. (IAT 2004). Proceedings. IEEE/WIC/ACM International Conference on . 2004

机译：CHQ：用于部分可观察的马尔可夫决策过程的多主体强化学习方案
5. Macro-Action-Based Multi-Agent Deep Reinforcement Learning in Cooperative Tasks [D] . Lu, Xingyu. 2021

机译：基于宏观动作的多智能经济型深度加强学习合作任务
6. On-Demand Channel Bonding in Heterogeneous WLANs: A Multi-Agent Deep Reinforcement Learning Approach [O] . Hang Qi, Hao Huang, Zhiqun Hu, 2020

机译：异构WLAN中的按需信道绑定：多代理深度强化学习方法
7. Multi-Agent Deep Reinforcement Learning-Based Cooperative Edge Caching for Ultra-Dense Next-Generation Networks [O] . Shuangwu Chen, Zhen Yao, Xiaofeng Jiang, 2021

机译：基于多功能深度加强学习的合作边缘缓存超密集的下一代网络

Deep Reinforcement Learning for Event-Driven Multi-Agent Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅