【24h】

Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

机译:学习率调整囚犯困境游戏的Q-Learning

获取原文

摘要

Many multiagent Q-learning algorithms have been proposed to date, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). In the previous paper, the author proposed the utility-based Q-learning for PD, which used utilities as rewards in order to maintain mutual cooperation once it had occurred. However, since the agent's action depends on the relation of Q-values the agent has, the mutual cooperation can be maintained by adjusting the learning rate of Q-learning. Thus, in this paper, we deal with the learning rate directly and introduce a new Q-learning method called the learning-rate adjusting Q-learning, or LRA-Q.
机译:迄今为止已经提出了许多多元Q学习算法,大多数旨在融合到纳什均衡,这在像囚犯的困境(PD)这样的游戏中是不可取的。在上文中,作者提出了基于实用的Q-Learning for PD,其中利用公用事业作为奖励,以便在发生后维持相互合作。然而,由于代理的行动取决于代理的Q值的关系,因此可以通过调整Q-Learning的学习率来维护相互合作。因此,在本文中,我们直接处理学习率,并引入一种新的Q学习方法,称为学习率调整Q-Learning,或LRA-Q。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号