...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Learning Nash Equilibrium for General-Sum Markov Games from Batch Data
【24h】

Learning Nash Equilibrium for General-Sum Markov Games from Batch Data

机译:从批处理数据学习通用和马尔可夫博弈的纳什均衡

获取原文
           

摘要

This paper addresses the problem of learning a Nash equilibrium in $γ$-discounted multiplayer general-sum Markov Games (MGs) in a batch setting. As the number of players increases in MG, the agents may either collaborate or team apart to increase their final rewards. One solution to address this problem is to look for a Nash equilibrium. Although, several techniques were found for the subcase of two-player zero-sum MGs, those techniques fail to find a Nash equilibrium in general-sum Markov Games. In this paper, we introduce a new definition of $ε$-Nash equilibrium in MGs which grasps the strategy’s quality for multiplayer games. We prove that minimizing the norm of two Bellman-like residuals implies to learn such an $ε$-Nash equilibrium. Then, we show that minimizing an empirical estimate of the $L_p$ norm of these Bellman-like residuals allows learning for general-sum games within the batch setting. Finally, we introduce a neural network architecture that successfully learns a Nash equilibrium in generic multiplayer general-sum turn-based MGs.
机译:本文解决了在成批设置的情况下在折价$γ$的多人通用和马尔可夫游戏(MGs)中学习纳什均衡的问题。随着MG中玩家人数的增加,代理商可以合作或组队以增加最终奖励。解决此问题的一种方法是寻找纳什均衡。尽管针对两人零和MG的子情况发现了几种技术,但这些技术未能在一般和式马尔可夫博弈中找到纳什均衡。在本文中,我们引入了MG中$ε$-纳什均衡的新定义,该定义掌握了多人游戏策略的质量。我们证明,最小化两个类似于Bellman的残差的范数意味着要学习这样的$ε$ -Nash平衡。然后,我们表明,最小化这些类似于Bellman的残差的$ L_p $范数的经验估计,可以学习批处理范围内的一般和游戏。最后,我们介绍了一种神经网络体系结构,该体系结构成功地学习了基于通用多人通用和型回合的MG中的纳什均衡。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号