【24h】

A Multiagent Reinforcement Learning Algorithm using Extended Optimal Response

机译:基于扩展最优响应的多主体强化学习算法

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Stochastic games provides a theoretical framework to multiagent reinforcement learning. Based on the framework, a multiagent reinforcement learning algorithm for zero-sum stochastic games was proposed by Littman and it was extended to general-sum games by Hu and Wellman. Given a stochastic game, if all agents learn with their algorithm, we can expect that the policies of the agents converge to a Nash equilibrium. However, agents with their algorithm always try to converge to a Nash equilibrium independent of the policies used by the other agents. In addition, in case there are multiple Nash equilibria, agents must agree on the equilibrium where they want to reach. Thus, their algorithm lacks adaptability in a sense. In this paper, we propose a multiagent reinforcement learning algorithm. The algorithm uses the extended optimal response which we introduce in this paper. It will converge to a Nash equilibrium when other agents are adaptable, otherwise it will make an optimal response. We also provide some empirical results in three simple stochastic games, which show that the algorithm can realize what we intend.
机译:随机游戏为多主体强化学习提供了理论框架。在此框架的基础上,Littman提出了一种用于零和随机游戏的多主体强化学习算法,Hu和Wellman将其扩展到了一般和游戏。给定一个随机博弈,如果所有智能体都使用他们的算法学习,我们可以期望智能体的策略收敛到纳什均衡。但是,具有其算法的代理始终会尝试收敛到Nash均衡,而与其他代理所使用的策略无关。此外,在存在多个纳什均衡的情况下,代理商必须就他们想要达到的均衡达成一致。因此,他们的算法在某种程度上缺乏适应性。在本文中,我们提出了一种多主体强化学习算法。该算法使用了我们在本文中介绍的扩展最优响应。当其他因子可适应时,它将收敛到Nash平衡,否则它将做出最佳响应。我们还通过三个简单的随机博弈提供了一些实验结果,表明该算法可以实现我们的预期。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号