【24h】

Learning-Rate Adjusting Q-Learning for Prisoner's Dilemma Games

机译:囚徒困境游戏的学习率调整Q学习

获取原文

摘要

Many multiagent Q-learning algorithms have been proposed to date, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). In the previous paper, the author proposed the utility-based Q-learning for PD, which used utilities as rewards in order to maintain mutual cooperation once it had occurred. However, since the agent's action depends on the relation of Q-values the agent has, the mutual cooperation can be maintained by adjusting the learning rate of Q-learning. Thus, in this paper, we deal with the learning rate directly and introduce a new Q-learning method called the learning-rate adjusting Q-learning, or LRA-Q.
机译:迄今为止,已经提出了许多多主体Q学习算法,并且它们中的大多数旨在收敛到Nash平衡,这在诸如囚徒困境(PD)之类的游戏中是不希望的。在先前的论文中,作者提出了基于效用的PD学习,该学习将效用作为奖励,以便在发生合作时保持相互合作。但是,由于代理的动作取决于代理所具有的Q值的关系,因此可以通过调整Q学习的学习率来维持相互合作。因此,在本文中,我们直接处理学习率,并介绍一种称为学习率调整Q学习或LRA-Q的新Q学习方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号