...
首页> 外文期刊>Web Intelligence and Agent Systems >Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games
【24h】

Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

机译:基于实用程序的Q学习,以促进囚徒困境游戏中的合作

获取原文
获取原文并翻译 | 示例
           

摘要

This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may yield mutual cooperation in a PD game. Although such mutual cooperation usually occurs singly, it can be facilitated if the Q-function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many consecutive repetitions of mutual cooperation are needed to make the Q-function of cooperation larger than that of defection. In addition, from the perspective of the author's previous works that discriminate utilities from rewards and use utilities for learning in PD games, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.
机译:这项工作涉及多代理环境中的Q学习。多主体Q学习方法很多,其中大多数旨在收敛到Nash平衡,这在《囚徒困境》(PD)等游戏中是不希望的。但是,在选择行为以避免局部最优的过程中使用随机方法的常规Q学习代理可能会在PD游戏中产生相互合作。尽管这种相互合作通常单独发生,但是如果合作的Q函数大于合作后的叛逃的Q函数,则可以促进这种相互合作。这项工作得出了一个定理,即要使合作的Q函数大于偏差的合作,需要多少次连续的相互重复。另外,从作者以前的工作中将效用与奖励区分开来,并且将效用用于PD游戏中的学习的角度来看,这项工作还得出了一个结论,即通过一次相互合作使Q函数变大需要多少效用。 。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号