Nash Q-Learning for General-Sum Stochastic Games

Hu Junling; Wellman Michael P.

首页> 外文期刊>Journal of machine learning research >Nash Q-Learning for General-Sum Stochastic Games

【24h】

Nash Q-Learning for General-Sum Stochastic Games

机译：Nash Q-学习常规和随机游戏

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

We extend Q-learning to a noncooperative multiagent context, using theframework of general-sum stochastic games. A learning agent maintainsQ-functions over joint actions, and performs updates based on assumingNash equilibrium behavior over the current Q-values. This learningprotocol provably converges given certain restrictions on the stagegames (defined by Q-values) that arise during learning. Experiments with a pair of two-playergrid games suggest that such restrictions on the game structure arenot necessarily required. Stage games encountered during learning in both grid environments violate the conditions.However, learningconsistently converges in the first grid game, which has a uniqueequilibrium Q-function, but sometimes fails to converge in thesecond, which has three different equilibrium Q-functions.In a comparison of offline learning performance inboth games, we find agents are more likely to reach a joint optimalpath with Nash Q-learning than with a single-agent Q-learningmethod. When at least one agent adopts Nash Q-learning,the performance of both agents is better than using single-agentQ-learning. We have also implemented an online version of NashQ-learning that balances exploration with exploitation,yielding improved performance. color="gray">

机译：我们使用一般和随机游戏的框架将Q学习扩展到非合作多主体环境。学习代理在关节动作上维持Q函数，并基于假设当前N值上的纳什均衡行为来执行更新。给定在学习过程中出现的阶段游戏（由Q值定义）的某些限制，该学习协议可以收敛。一对两人游戏的实验表明，对游戏结构的这种限制不是必需的。在两个网格环境中学习期间遇到的阶段游戏都违反了条件，但是，学习在第一个网格游戏中始终收敛，该游戏具有唯一的均衡Q函数，但有时在第二个网格游戏中却无法收敛，后者具有三个不同的均衡Q函数。比较两款游戏的离线学习表现，我们发现与使用单代理Q学习方法相比，使用Nash Q学习进行学习的代理商更有可能达成共同的最优路径。当至少一个代理采用Nash Q学习时，两种代理的性能均优于使用单个Agent Q学习。我们还实现了NashQ学习的在线版本，该版本使探索与开发保持平衡，并提高了性能。 color =“ gray”>

著录项

来源
《Journal of machine learning research》 |2003年第11期|共31页
作者
Hu Junling; Wellman Michael P.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. General-sum stochastic games: Verifiability conditions for Nash equilibria [J] . H. L. Prasad, S. Bhatnagar Automatica . 2012,第11期

机译：广义和随机游戏：纳什均衡的可验证性条件
2. Learning Nash Equilibrium for General-Sum Markov Games from Batch Data [J] . Julien Perolat, Florian Strub, Bilal Piot, JMLR: Workshop and Conference Proceedings . 2017,第2009期

机译：从批处理数据学习通用和马尔可夫博弈的纳什均衡
3. Multi-agent Inverse Reinforcement Learning for Certain General-Sum Stochastic Games [J] . Lin Xiaomin, Adams Stephen C., Beling Peter A. The Journal of Artificial Intelligence Research . 2019,第期

机译：用于某一般性加速游戏的多功能逆钢筋学习
4. Experimental results on Q-learning for general-sum stochastic games [C] . Junling Hu, Michael P. Wellman International conference on machine learning . 2000

机译：Q-Learning普通加速游戏的实验结果
5. Two new computer based results in game theory related to combinatorial games and Nash equilibria. [D] . Oudalov, Vladimir. 2013

机译：博弈论中两个新的基于计算机的结果与组合博弈和纳什均衡有关。
6. Stochastic game theory: For playing games not just for doing theory [O] . Jacob K. Goeree, Charles A. Holt 1999

机译：随机博弈论：用于玩游戏而不仅仅是做理论
7. Actor-Critic Algorithms for Learning Nash Equilibria in N-player General-Sum Games [O] . Prasad, H. L, Prashanth, L. A., Bhatnagar, Shalabh 2015

机译：N-player中学习纳什均衡的演员批评算法一般和游戏
8. Multi-Robot Negotiation: Approximating the Set of Subgame Perfect Equilibria in General-Sum Stochastic Games [R] . Murray, C. , Gordon, G. 2006

机译：多机器人协商：一般随机游戏中子博弈完全均衡的近似

Nash Q-Learning for General-Sum Stochastic Games

摘要

著录项

相似文献

相关主题

期刊订阅