...
首页> 外文期刊>Autonomous agents and multi-agent systems >Coordinated learning in multiagent MDPs with infinite state-space
【24h】

Coordinated learning in multiagent MDPs with infinite state-space

机译:具有无限状态空间的多主体MDP中的协同学习

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper we address the problem of simultaneous learning and coordination in multiagent Markov decision problems (MMDPs) with infinite state-spaces. We separate this problem in two distinct subproblems: learning and coordination. To tackle the problem of learning, we survey Q-learning with soft-state aggregation (Q-SSA), a well-known method from the reinforcement learning literature (Singh et al. in Advances in neural information processing systems. MIT Press, Cambridge, vol 7, pp 361-368,1994). Q-SSA allows the agents in the game to approximate the optimal Q-function, from which the optimal policies can be computed. We establish the convergence of Q-SSA and introduce a new result describing the rate of convergence of this method. In tackling the problem of coordination, we start by pointing out that the knowledge of the optimal Q-function is not enough to ensure that all agents adopt a jointly optimal policy. We propose a novel coordination mechanism that, given the knowledge of the optimal Q-function for an MMDP, ensures that all agents converge to a jointly optimal policy in every relevant state of the game. This coordination mechanism, approximate biased adaptive play (ABAP), extends biased adaptive play (Wang and Sand-holm in Advances in neural information processing systems. MIT Press, Cambridge, vol 15, pp 1571-1578, 2003) to MMDPs with infinite state-spaces. Finally, we combine Q-SSA with ABAP, this leading to a novel algorithm in which learning of the game and coordination take place simultaneously. We discuss several important properties of this new algorithm and establish its convergence with probability 1. We also provide simple illustrative examples of application.
机译:在本文中,我们解决了无限状态空间的多智能体马尔可夫决策问题(MMDP)中的同时学习和协调问题。我们将此问题分为两个不同的子问题:学习和协调。为了解决学习问题,我们调查了带有软状态聚合(Q-SSA)的Q学习,这是强化学习文献中的一种著名方法(Singh等人在《神经信息处理系统的进展》中。麻省理工学院出版社,剑桥,第7卷,第361-368页,1994)。 Q-SSA允许游戏中的代理近似最佳Q功能,从中可以计算出最佳策略。我们建立了Q-SSA的收敛性,并介绍了描述该方法收敛速度的新结果。在解决协调问题时,我们首先指出最优Q函数的知识不足以确保所有代理采用联合最优策略。我们提出了一种新颖的协调机制,在给定MMDP的最佳Q函数知识的情况下,可确保所有代理在游戏的每个相关状态下收敛于共同的最优策略。这种协调机制,即近似有偏适应性玩法(ABAP),将有偏适应性玩法(Wang和Sand-holm在神经信息处理系统中的发展。MIT出版社,剑桥,第15卷,第1571-1578页,2003年)扩展到无限状态的MMDP。 -空间。最后,我们将Q-SSA与ABAP结合在一起,这导致了一种新颖的算法,其中游戏的学习和协调同时进行。我们讨论了该新算法的几个重要属性,并建立了概率1的收敛性。我们还提供了简单的示例性应用示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号