Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

Koichi Moriyama

首页> 外文期刊>Web Intelligence and Agent Systems >Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

【24h】

Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

机译：基于实用程序的Q学习，以促进囚徒困境游戏中的合作

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may yield mutual cooperation in a PD game. Although such mutual cooperation usually occurs singly, it can be facilitated if the Q-function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many consecutive repetitions of mutual cooperation are needed to make the Q-function of cooperation larger than that of defection. In addition, from the perspective of the author's previous works that discriminate utilities from rewards and use utilities for learning in PD games, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.

机译：这项工作涉及多代理环境中的Q学习。多主体Q学习方法很多，其中大多数旨在收敛到Nash平衡，这在《囚徒困境》（PD）等游戏中是不希望的。但是，在选择行为以避免局部最优的过程中使用随机方法的常规Q学习代理可能会在PD游戏中产生相互合作。尽管这种相互合作通常单独发生，但是如果合作的Q函数大于合作后的叛逃的Q函数，则可以促进这种相互合作。这项工作得出了一个定理，即要使合作的Q函数大于偏差的合作，需要多少次连续的相互重复。另外，从作者以前的工作中将效用与奖励区分开来，并且将效用用于PD游戏中的学习的角度来看，这项工作还得出了一个结论，即通过一次相互合作使Q函数变大需要多少效用。。

著录项

来源
《Web Intelligence and Agent Systems》 |2009年第3期|233-242|共10页
作者
Koichi Moriyama;
展开▼
作者单位

The Institute of Scientific and Industrial Research, Osaka University 8-1, Mihogaoka, Ibaraki, Osaka, 567-0047, Japan;

展开▼
收录信息美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
reinforcement learning; Q-learning; prisoner's dilemma; iterative game; utility;

机译：强化学习;Q学习囚徒困境;迭代游戏;效用;

相似文献

外文文献
中文文献
专利

1. Neighbor-considered migration facilitates cooperation in prisoner's dilemma games [J] . Ren Yizhi, Chen Xiangyu, Wang Zhen, Applied mathematics and computation . 2018,第期

机译：邻近的移民促进了囚犯困境游戏的合作
2. The diversity in the decision facilitates cooperation in the sequential prisoner's dilemma game [J] . Ohdaira T., Terano T. Advances in complex systems . 2011,第3期

机译：决定的多样性促进了在囚犯困境游戏中的合作
3. THE DIVERSITY IN THE DECISION FACILITATES COOPERATION IN THE SEQUENTIAL PRISONER'S DILEMMA GAME [J] . TETSUSHI OHDAIRA TAKAO TERANO Advances in Complex Systems . 2011,第3期

机译：决策中的多样性促进了顺序主角的困境游戏中的合作
4. Utility Based Q-learning to Maintain Cooperation in Prisoner''s Dilemma Games [C] . Moriyama, Koichi Intelligent Agent Technology (IAT), 2007 IEEE/WIC/ACM International Conference on . 2007

机译：基于实用程序的Q学习在囚徒困境游戏中保持合作
5. Collective learning and cooperation between intelligent software agents: A study of artificial personality and behavior in autonomous agents playing the infinitely repeated prisoner's dilemma game. [D] . Shebalin, Paul Valentine. 1997

机译：智能软件代理之间的集体学习与合作：研究在玩无限次囚徒困境游戏中的自治代理中人为的人格和行为。
6. Aspiration-based coevolution of link weight promotes cooperation in the spatial prisoners dilemma game [O] . Chen Shen, Chen Chu, Lei Shi, 2018

机译：基于志向的链接权重协同进化促进空间囚徒困境游戏中的合作
7. Preference-based Cooperation in a Prisoner's Dilemma Game: Whole Population Cooperation without Information Flow across Matches [O] . Jung Hanjoon Michael 2007

机译：囚徒困境博弈中的偏好合作：跨越比赛的无信息流动的全人口合作

Utility based Q-learning to facilitate cooperation in Prisoner's Dilemma games

摘要

著录项

相似文献

相关主题

期刊订阅