首页> 外文会议>IEEE International Conference on Agents >The resilience of cooperation in a Dilemma game played by reinforcement learning agents
【24h】

The resilience of cooperation in a Dilemma game played by reinforcement learning agents

机译:强化学习代理人在困境游戏中的合作弹性

获取原文

摘要

This work discusses what an (independent) reinforcement learning agent can do in a multiagent environment. In particular, we consider a stateless Q-learning agent in a Prisoner's Dilemma (PD) game. Although it had been shown in the literature that stateless, independent Q-learning agents had been difficult to cooperate with each other in an iterated PD (IPD) game, we gave a condition of PD payoffs and Q-learning parameters that helps the agents cooperate with each other. Based on the condition, we also discussed the ratio of mutual cooperation happening in IPD games. It supposed that mutual cooperation was fragile, i.e., one misfortune defection would have the agents slide down the spiral of mutual defection. However, it is not always correct. Mutual cooperation will reinforce itself and thus it will be robust and resilient. Hence, this work analytically derives how long a series of mutual cooperation continues once it happened while considering the resilience. It gives us further comprehension of the process of reinforcement learning in IPD games.
机译:这项工作讨论了(独立的)强化学习代理在多主体环境中可以做什么。特别是,我们在囚徒困境(PD)游戏中考虑了无状态Q学习代理。尽管在文献中已经表明,无状态,独立的Q学习代理在迭代PD(IPD)博弈中难以彼此协作,但我们给出了PD收益和Q学习参数的条件,该条件可以帮助代理进行协作彼此。在此基础上,我们还讨论了IPD游戏中相互合作发生的比例。它认为相互合作是脆弱的,即,一次不幸的叛逃会使特工们沿着相互叛逃的螺旋式下滑。但是,它并不总是正确的。相互合作将加强自身,因此将是强大和有弹性的。因此,这项工作从分析的角度得出了一系列相互合作一旦发生的持续时间,同时考虑了弹性。它使我们对IPD游戏中强化学习的过程有了进一步的了解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号