A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response

机译：基于扩展最优响应的多主体强化学习算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games by Hu and Wellman. Given a stochastic game, if all agents learn with their algorithm, we can expect that the policies of the agents converge to a Nash equilibrium. However, agents with their algorithm always try to converge to a Nash equilibrium independent of the policies used by the other agents. In addition, in case there are multiple Nash equilibria, agents must agree on the equilibrium where they want to reach. Thus, their algorithm lacks adaptability in a sense. In this paper, we propose a multiagent reinforcement learning algorithm. The algorithm uses the extended optimal response which we introduce in this paper. It will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response. We also provide some empirical results in three simple stochastic games, which show that the algorithm can realize what we intend.

机译：随机游戏为多主体强化学习提供了理论框架。在此框架的基础上，Littman提出了一种用于零和随机游戏的多主体强化学习算法，Hu和Wellman将其扩展到了一般和游戏。给定一个随机博弈，如果所有智能体都使用他们的算法学习，我们可以期望智能体的策略收敛到纳什均衡。但是，具有其算法的代理始终会尝试收敛到Nash均衡，而与其他代理所使用的策略无关。此外，在存在多个纳什均衡的情况下，代理商必须就他们想要达到的均衡达成一致。因此，他们的算法在某种程度上缺乏适应性。在本文中，我们提出了一种多主体强化学习算法。该算法使用了我们在本文中介绍的扩展最优响应。当其他因子可适应时，它将收敛到Nash平衡，否则它将做出最佳响应。我们还通过三个简单的随机博弈提供了一些实验结果，表明该算法可以实现我们的预期。

著录项

来源
《First International Joint Conference on Autonomous Agents and Multiagent Systems Pt.1 Jul 15-19, 2002 Bologna, Italy》|2002年|p.370-377|共8页
会议地点 Bologna(IT);Bologna(IT);Bologna(IT);Bologna(IT);Bologna(IT);Bologna(IT);Bologna(IT);Bologna(IT)
作者
Nobuo Suematsu; Akira Hayashi;
展开▼
作者单位

Faculty of Information Sciences Hiroshima City University Hiroshima, 731-3194, Japan;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
reinforcement learning; Q-learning; stochastic games; markov games;

机译：强化学习; Q学习;随机游戏;马尔可夫游戏;

相似文献

外文文献
中文文献
专利

1. Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning [J] . Mu Chaoxu, Zhao Qian, Gao Zhongke, Journal of the Franklin Institute . 2019,第13期

机译：使用强化学习的离散多主体系统最优共识控制的Q学习解决方案
2. Performance comparison of multiagent cooperative reinforcement learning algorithms for dynamic decision making in retail shop application [J] . Deepak A. Vidhate, Parag Kulkarni International journal of computational systems engineering . 2019,第3期

机译：零售店应用中动态决策的多主体协同强化学习算法性能比较
3. A hybrid algorithm using a genetic algorithm and multiagent reinforcement learning heuristic to solve the traveling salesman problem [J] . Alipour Mir Mohammad, Razavi Seyed Naser, Derakhshi Mohammad Reza Feizi, Neural computing & applications . 2018,第9期

机译：一种使用遗传算法的混合算法和多元素增强学习启发式解决旅行推销员问题
4. A multiagent reinforcement learning algorithm using extended optimal response [C] . Nobuo Suematsu, Akira Hayashi International joint conference on Autonomous agents and multiagent systems . 2002

机译：使用扩展最优响应的多主体强化学习算法
5. Explaining Collective Behavior with Dynamical Systems: Spatial Gradient Sensing in Eukaryotic Chemotaxis and Learning Dynamics in Multiagent Reinforcement Learning [D] . Shams, Daniel . 2019

机译：用动力系统解释集体行为：多核化趋化性的空间梯度传感和多核强化学习中的学习动态
6. Multiagent cooperation and competition with deep reinforcement learning [O] . Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, -1

机译：多主体合作与竞争与深度强化学习
7. Methods and Algorithms for Knowledge Reuse in Multiagent Reinforcement Learning [O] . Felipe Leno Da Silva, Anna Helena Reali Costa 2020

机译：多读强度学习中知识重用的方法和算法

A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅