Coordinated learning in multiagent MDPs with infinite state-space

Francisco S. Melo; M. Isabel Ribeiro

首页> 外文期刊>Autonomous agents and multi-agent systems >Coordinated learning in multiagent MDPs with infinite state-space

【24h】

Coordinated learning in multiagent MDPs with infinite state-space

机译：具有无限状态空间的多主体MDP中的协同学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey Q-learning with soft-state aggregation (Q-SSA), a well-known method from the reinforcement learning literature (Singh et al. in Advances in neural information processing systems. MIT Press, Cambridge, vol 7, pp 361-368,1994). Q-SSA allows the agents in the game to approximate the optimal Q-function, from which the optimal policies can be computed. We establish the convergence of Q-SSA and introduce a new result describing the rate of convergence of this method. In tackling the problem of coordination, we start by pointing out that the knowledge of the optimal Q-function is not enough to ensure that all agents adopt a jointly optimal policy. We propose a novel coordination mechanism that, given the knowledge of the optimal Q-function for an MMDP, ensures that all agents converge to a jointly optimal policy in every relevant state of the game. This coordination mechanism, approximate biased adaptive play (ABAP), extends biased adaptive play (Wang and Sand-holm in Advances in neural information processing systems. MIT Press, Cambridge, vol 15, pp 1571-1578, 2003) to MMDPs with infinite state-spaces. Finally, we combine Q-SSA with ABAP, this leading to a novel algorithm in which learning of the game and coordination take place simultaneously. We discuss several important properties of this new algorithm and establish its convergence with probability 1. We also provide simple illustrative examples of application.

机译：在本文中，我们解决了无限状态空间的多智能体马尔可夫决策问题（MMDP）中的同时学习和协调问题。我们将此问题分为两个不同的子问题：学习和协调。为了解决学习问题，我们调查了带有软状态聚合（Q-SSA）的Q学习，这是强化学习文献中的一种著名方法（Singh等人在《神经信息处理系统的进展》中。麻省理工学院出版社，剑桥，第7卷，第361-368页，1994）。 Q-SSA允许游戏中的代理近似最佳Q功能，从中可以计算出最佳策略。我们建立了Q-SSA的收敛性，并介绍了描述该方法收敛速度的新结果。在解决协调问题时，我们首先指出最优Q函数的知识不足以确保所有代理采用联合最优策略。我们提出了一种新颖的协调机制，在给定MMDP的最佳Q函数知识的情况下，可确保所有代理在游戏的每个相关状态下收敛于共同的最优策略。这种协调机制，即近似有偏适应性玩法（ABAP），将有偏适应性玩法（Wang和Sand-holm在神经信息处理系统中的发展。MIT出版社，剑桥，第15卷，第1571-1578页，2003年）扩展到无限状态的MMDP。 -空间。最后，我们将Q-SSA与ABAP结合在一起，这导致了一种新颖的算法，其中游戏的学习和协调同时进行。我们讨论了该新算法的几个重要属性，并建立了概率1的收敛性。我们还提供了简单的示例性应用示例。

著录项

来源
《Autonomous agents and multi-agent systems》 |2010年第3期|P.321-367|共47页
作者
Francisco S. Melo; M. Isabel Ribeiro;
展开▼
作者单位

School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA 15213, USA;

Institute for Systems and Robotics, Instituto Superior Tecnico, Av. Rovisco Pais, 1, 1049-001 Lisbon, Portugal;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
multiagent MDPs; infinite state-spaces; simultaneous learning and coordination; Q-learning with soft-state aggregation; Approximate biased adaptive play;

机译：多代理MDP;无限状态空间;同时学习和协调;具有软状态聚合的Q学习;近似偏差的自适应游戏;

相似文献

外文文献
中文文献
专利

1. MOO-MDP: An Object-Oriented Representation for Cooperative Multiagent Reinforcement Learning [J] . Da Silva Felipe Leno, Glatt Ruben, Reali Costa Anna Helena Cybernetics, IEEE Transactions on . 2019,第2期

机译：MOO-MDP：协作多代理强化学习的面向对象表示
2. Distributed Multiagent Coordinated Learning for Autonomous Driving in Highways Based on Dynamic Coordination Graphs [J] . IEEE Transactions on Intelligent Transportation Systems . 2020,第2期

机译：基于动态协调图的高速公路自动驾驶分布式多智能体协同学习
3. Coordinated Learning by Model Difference Identification in Multiagent Systems with Sparse Interactions [J] . Qi Zhang, Peng Jiao, Quanjun Yin, Discrete dynamics in nature and society . 2016,第Pta3期

机译：稀疏交互多读系统模型差异识别的协调学习
4. Coordinated Plan Management Using Multiagent MDPs [C] . David J. Musliner, Robert P. Goldman, Edmund H. Durfee, AAAI Symposium on Distributed Plan and Schedule Management . 2006

机译：协调计划管理使用Multi8gent MDPS
5. Infinite DimEnsionAl State-space as a Pricing Tool for Subsidized Energy Systems [D] . Alabdulhadi, Abdullah Tariq A. 2017

机译：无限DimEnsionAl状态空间作为补贴能源系统的定价工具
6. Modeling and Simulation of Complex Network Attributes on Coordinating Large Multiagent System [O] . Yang Xu, Xiang Li, Ming Liu -1

机译：大型多智能体系统协同的复杂网络属性建模与仿真
7. Scalable Planning and Learning for Multiagent POMDPs [O] . Amato C, Oliehoek F 2015

机译：多代理POMDP的可扩展计划和学习
8. Scalable Planning and Learning for Multiagent POMDPs. [R] . Amato, C., Oliehoek, F. A. 2015

机译：多代理pOmDp的可扩展规划和学习。

Coordinated learning in multiagent MDPs with infinite state-space

摘要

著录项

相似文献

相关主题

期刊订阅