Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

Jacob W. Crandall; Michael A. Goodrich

首页> 外文期刊>Machine Learning >Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

【24h】

Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

机译：使用强化学习来学习在重复游戏中的竞争，协调和合作

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We consider the problem of learning in repeated general-sum matrix games when a learning algorithm can observe the actions but not the payoffs of its associates. Due to the non-stationarity of the environment caused by learning associates in these games, most state-of-the-art algorithms perform poorly in some important repeated games due to an inability to make profitable compromises. To make these compromises, an agent must effectively balance competing objectives, including bounding losses, playing optimally with respect to current beliefs, and taking calculated, but profitable, risks. In this paper, we present, discuss, and analyze M-Qubed, a reinforcement learning algorithm designed to overcome these deficiencies by encoding and balancing best-response, cautious, and optimistic learning biases. We show that M-Qubed learns to make profitable compromises across a wide-range of repeated matrix games played with many kinds of learners. Specifically, we prove that M-Qubed's average payoffs meet or exceed its maximin value in the limit. Additionally, we show that, in two-player games, M-Qubed's average payoffs approach the value of the Nash bargaining solution in self play. Furthermore, it performs very well when associating with other learners, as evidenced by its robust behavior in round-robin and evolutionary tournaments of two-player games. These results demonstrate that an agent can learn to make good compromises, and hence receive high payoffs, in repeated games by effectively encoding and balancing best-response, cautious, and optimistic learning biases.

机译：当学习算法可以观察到动作但不能观察到其同伴的收益时，我们考虑重复的一般和矩阵游戏中的学习问题。由于这些游戏中学习助手造成的环境不稳定，大多数最新算法在某些重要的重复游戏中表现不佳，原因是无法做出有利可图的折衷。为了做出这些折衷，代理商必须有效地平衡相互竞争的目标，包括有限的损失，相对于当前信念的最佳发挥以及承担经过计算但可获利的风险。在本文中，我们介绍，讨论和分析M-Qubed，这是一种强化学习算法，旨在通过编码和平衡最佳响应，谨慎和乐观学习偏见来克服这些缺陷。我们表明，M-Qubed在与许多学习者一起玩的一系列重复矩阵游戏中学会了做出有利可图的妥协。具体来说，我们证明M-Qubed的平均收益达到或超过其极限值的最大值。此外，我们证明，在两人游戏中，M-Qubed的平均收益接近于纳什讨价还价解决方案的价值。此外，它在与其他学习者建立联系时的表现也非常好，这在两人游戏的循环赛和进化赛中表现出色。这些结果表明，代理商可以通过有效编码和平衡最佳响应，谨慎和乐观的学习偏见，在反复的游戏中学会做出良好的折衷，从而获得较高的回报。

著录项

来源
《Machine Learning》 |2011年第3期|p.281-314|共34页
作者
Jacob W. Crandall; Michael A. Goodrich;
展开▼
作者单位

Computing and Information Science Program, Masdar Institute of Science and Technology, Abu Dhabi,UAE;

Computer Science Department, Brigham Young University, Provo, UT, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
multi-agent learning; reinforcement learning; game theory;

机译：多主体学习;强化学习;博弈论;

相似文献

外文文献
中文文献
专利

1. Reinforcement Learning for Repeated Power Control Game in Cognitive Radio Networks [J] . Zhou Pan, Chang Yusun, Copeland John A. Selected Areas in Communications, IEEE Journal on . 2012,第1期

机译：认知无线电网络中重复功率控制博弈的强化学习
2. Exploring selfish reinforcement learning in repeated games with stochastic rewards [J] . Katja Verbeeck, Ann Nowe, Johan Parent, Autonomous agents and multi-agent systems . 2007,第3期

机译：在具有随机奖励的重复游戏中探索自私的强化学习
3. Learning through reinforcement for N-person repeated constrained games [J] . Poznyak A.S., Najim K. IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics . 2002,第6期

机译：通过强化学习来进行N人重复受限游戏
4. Learning to compete, compromise, and cooperate in repeated general-sum games [C] . Jacob W. Crandall, Michael A. Goodrich International conference on Machine learning . 2005

机译：学会在反复的总和游戏中竞争，妥协和合作
5. On Deep Reinforcement Learning for Games: Generalization of Deep Q-Learning with Multiple Policy Heads [D] . Boucher, Mathieu. 2020

机译：关于游戏的深度加固学习：多重政策头部深度Q学的泛化
6. Multi-agent reinforcement learning with approximate model learning for competitive games [O] . Young Joon Park, Yoon Sang Cho, Seoung Bum Kim 2012

机译：多主体强化学习和近似模型学习的竞技游戏
7. Learning to compete, compromise, and cooperate in repeated general-sum games [O] . Jacob W. Crandall, Michael A. Goodrich 2005

机译：学会在反复的总和游戏中竞争，妥协和合作

Learning to compete, coordinate, and cooperate in repeated games using reinforcement learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅