...
首页> 外文期刊>Cybernetics, IEEE Transactions on >Multiagent Reinforcement Learning With Unshared Value Functions
【24h】

Multiagent Reinforcement Learning With Unshared Value Functions

机译:具有无共享价值功能的多主体强化学习

获取原文
获取原文并翻译 | 示例
           

摘要

One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents’ value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).
机译:基于平衡的MARL是多主体强化学习(MARL)的一种重要方法,它是强化学习和博弈论的结合。现有的大多数算法都涉及混合策略均衡的计算昂贵的计算,并且要求代理商复制其他代理商的价值函数,以便在每个状态下进行均衡计算。这是不现实的,因为由于隐私或安全方面的考虑,代理商可能不愿意共享此类信息。本文旨在开发新颖且高效的MARL算法,而无需代理共享价值函数。首先,鉴于混合策略均衡通常在计算上很昂贵,因此我们采用纯策略均衡解决方案概念,而不是混合策略均衡。本文将三种类型的纯策略配置文件用作均衡解决方案概念:纯策略纳什均衡,均衡支配策略配置文件和非严格均衡支配策略配置文件。后两个解决方案概念是策略配置文件,与单个或多个纯策略纳什均衡相比,代理可以从中获取更高的收益。理论分析表明,这些策略配置文件是对称的元均衡。其次,由于价值功能在主体之间不共享,因此我们提出了一个多步骤的协商过程来寻找纯策略均衡。通过将它们放在一起,我们提出了一种新颖的MARL算法,称为基于协商的Q学习(NegoQ)。实验首先在网格世界游戏中进行,该游戏被广泛用于评估MARL算法。在这些游戏中,NegoQ学习均衡策略,并且比现有的MARL算法(相关的Q学习和Nash Q学习)运行得快得多。令人惊讶的是,与面向团队任务的MARL算法(例如朋友Q学习和分布式Q学习)相比,我们发现NegoQ在团队Markov游戏(如追逐游戏)中也表现出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号