首页> 外文会议>Agent and multi-agent systems : Technologies and applications >Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games
【24h】

Learning-Rate Adjusting Q-Learning for Two-Person Two-Action Symmetric Games

机译:两人两动作对称游戏的学习速率调整Q学习

获取原文
获取原文并翻译 | 示例

摘要

There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium that is not desirable in games like the Prisoner's Dilemma (PD). The author proposed the utility-based Q-learning (UB-Q) for PD that used utilities instead of rewards so as to maintain mutual cooperation once it had occurred. However, UB-Q has to know the payoffs of the game to calculate the utilities and works only in PD. Since a Q-learning agent's action depends on the relation of Q-values, the mutual cooperation can also be maintained by adjusting the learning rate. Thus, this paper deals with the learning rate directly and introduces another Q-learning method called the learning-rate adjusting Q-learning (LRA-Q). It calculates the learning rate from received payoffs and works in other kinds of two-person two-action symmetric games as well as PD. Numeric verification showed success of LRA-Q, but, it also revealed a side-effect.
机译:多主体Q学习方法很多,其中大多数旨在收敛到纳什均衡,这在《囚徒困境》(PD)等游戏中是不希望的。作者提出了一种基于效用的PD学习型UB-Q(UB-Q),它使用效用而不是奖励,从而在发生合作时保持相互合作。但是,UB-Q必须知道游戏的收益才能计算实用程序,并且只能在PD中使用。由于Q学习代理的行为取决于Q值的关系,因此也可以通过调整学习率来保持相互合作。因此,本文直接涉及学习率,并介绍了另一种称为学习率调整Q学习(LRA-Q)的Q学习方法。它从收到的收益中计算学习率,并可以在其他类型的两人两动作对称游戏以及PD中工作。数值验证显示LRA-Q成功,但也显示出副作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号