Multiagent Reinforcement Learning With Unshared Value Functions

Hu Yujing; Gao Yang; An Bo

首页> 外文期刊>Cybernetics, IEEE Transactions on >Multiagent Reinforcement Learning With Unshared Value Functions

【24h】

Multiagent Reinforcement Learning With Unshared Value Functions

机译：具有无共享价值功能的多主体强化学习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One important approach of multiagent reinforcement learning (MARL) is equilibrium-based MARL, which is a combination of reinforcement learning and game theory. Most existing algorithms involve computationally expensive calculation of mixed strategy equilibria and require agents to replicate the other agents’ value functions for equilibrium computing in each state. This is unrealistic since agents may not be willing to share such information due to privacy or safety concerns. This paper aims to develop novel and efficient MARL algorithms without the need for agents to share value functions. First, we adopt pure strategy equilibrium solution concepts instead of mixed strategy equilibria given that a mixed strategy equilibrium is often computationally expensive. In this paper, three types of pure strategy profiles are utilized as equilibrium solution concepts: pure strategy Nash equilibrium, equilibrium-dominating strategy profile, and nonstrict equilibrium-dominating strategy profile. The latter two solution concepts are strategy profiles from which agents can gain higher payoffs than one or more pure strategy Nash equilibria. Theoretical analysis shows that these strategy profiles are symmetric meta equilibria. Second, we propose a multistep negotiation process for finding pure strategy equilibria since value functions are not shared among agents. By putting these together, we propose a novel MARL algorithm called negotiation-based Q-learning (NegoQ). Experiments are first conducted in grid-world games, which are widely used to evaluate MARL algorithms. In these games, NegoQ learns equilibrium policies and runs significantly faster than existing MARL algorithms (correlated Q-learning and Nash Q-learning). Surprisingly, we find that NegoQ also performs well in team Markov games such as pursuit games, as compared with team-task-oriented MARL algorithms (such as friend Q-learning and distributed Q-learning).

机译：基于平衡的MARL是多主体强化学习（MARL）的一种重要方法，它是强化学习和博弈论的结合。现有的大多数算法都涉及混合策略均衡的计算昂贵的计算，并且要求代理商复制其他代理商的价值函数，以便在每个状态下进行均衡计算。这是不现实的，因为由于隐私或安全方面的考虑，代理商可能不愿意共享此类信息。本文旨在开发新颖且高效的MARL算法，而无需代理共享价值函数。首先，鉴于混合策略均衡通常在计算上很昂贵，因此我们采用纯策略均衡解决方案概念，而不是混合策略均衡。本文将三种类型的纯策略配置文件用作均衡解决方案概念：纯策略纳什均衡，均衡支配策略配置文件和非严格均衡支配策略配置文件。后两个解决方案概念是策略配置文件，与单个或多个纯策略纳什均衡相比，代理可以从中获取更高的收益。理论分析表明，这些策略配置文件是对称的元均衡。其次，由于价值功能在主体之间不共享，因此我们提出了一个多步骤的协商过程来寻找纯策略均衡。通过将它们放在一起，我们提出了一种新颖的MARL算法，称为基于协商的Q学习（NegoQ）。实验首先在网格世界游戏中进行，该游戏被广泛用于评估MARL算法。在这些游戏中，NegoQ学习均衡策略，并且比现有的MARL算法（相关的Q学习和Nash Q学习）运行得快得多。令人惊讶的是，与面向团队任务的MARL算法（例如朋友Q学习和分布式Q学习）相比，我们发现NegoQ在团队Markov游戏（如追逐游戏）中也表现出色。

著录项

来源
《Cybernetics, IEEE Transactions on》 |2015年第4期|647-662|共16页
作者
Hu Yujing; Gao Yang; An Bo;
展开▼
作者单位

Department of Computer ScienceState Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm design and analysis; Games; Joints; Learning (artificial intelligence); Markov processes; Nash equilibrium; Game theory; Nash equilibrium; multiagent reinforcement learning; negotiation;

机译：算法设计与分析;游戏;关节;学习（人工智能）;马尔可夫过程;纳什均衡;博弈论;纳什均衡;多主体强化学习;谈判;

相似文献

外文文献
中文文献
专利

1. Multiagent reinforcement learning through merging individually learned value functions [J] . ZHANG Hua-xiang, HUANG Shang-teng Journal of Harbin Institute of Technology . 2005,第3期

机译：通过合并单独学习的价值功能进行多主体强化学习
2. Multiagent reinforcement learning through merging individually learned value functions [J] . ZHANG Hua-xiang, HUANG Shang-teng Journal of Harbin Institute of Technology . 2005,第3期

机译：通过合并单独学习的价值功能进行多主体强化学习
3. Multiagent reinforcement learning through merging individually learned value functions [J] . ZHANG Hua-xiang, HUANG Shang-teng 哈尔滨工业大学学报（英文版） . 2005,第003期

机译：通过合并单独学习的价值功能进行多主体强化学习
4. Reinforcement learning based on multiple reward function for cooperative behavior acquisition in a multiagent environment [C] . Eiji Uchibe, Minoru Asada 日本ロボット学会学術講演会 . 1999

机译：基于多奖励功能在多读性行为获取的加固学习
5. Explaining Collective Behavior with Dynamical Systems: Spatial Gradient Sensing in Eukaryotic Chemotaxis and Learning Dynamics in Multiagent Reinforcement Learning [D] . Shams, Daniel . 2019

机译：用动力系统解释集体行为：多核化趋化性的空间梯度传感和多核强化学习中的学习动态
6. Multiagent cooperation and competition with deep reinforcement learning [O] . Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, -1

机译：多主体合作与竞争与深度强化学习
7. Adaptive Learning: A New Decentralized Reinforcement Learning Approach for Cooperative Multiagent Systems [O] . Meng-Lin Li, Shaofei Chen, Jing Chen 2020

机译：适应性学习：合作多读系统的新分散加固学习方法

Multiagent Reinforcement Learning With Unshared Value Functions

摘要

著录项

相似文献

相关主题

期刊订阅