Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System

机译：自助或团体练习：学习在多代理系统中发挥交替的马尔可夫游戏

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The research in reinforcement learning has achieved great success in strategic game playing. These successes are thanks to the incorporation of deep reinforcement learning (DRL) and Monte Carlo Tree Search (MCTS) to the agent trained under the self-play (SP) environment. By self-play, agents are provided with an incrementally more difficult curriculum which in turn facilitates learning. However, recent research suggests that agents trained via self-play may easily lead to getting stuck in local equilibria. In this paper, we consider a population of agents each independently learns to play an alternating Markov game (AMG). We propose a new training framework-group practice- for a population of decentralized RL agents. By group practice (GP), agents are assigned into multiple learning groups during training, for every episode of games, an agent is randomly paired up and practices with another agent in the learning group. The convergence result to the optimal value function and the Nash equilibrium are proved under the GP framework. Experimental study is conducted by applying GP to Q-learning algorithm and the deep Q-learning with Monte-Carlo tree search on the game of Connect Four and the game of Hex. We verify that GP is the more efficient training scheme than SP given the same amount of training. We also show that the learning effectiveness can even be improved when applying local grouping to agents.

机译：加固学习的研究取得了巨大成功的战略游戏。这些成功符合深度加强学习（DRL）和蒙特卡罗树搜索（MCTS）的纳入自拍（SP）环境培训的代理。通过自我扮演，提供代理商，逐步提供更加困难的课程，其又有助于学习。然而，最近的研究表明，通过自我游戏训练的代理可能很容易导致陷入局部均衡。在本文中，我们考虑一群代理商，每个代理商都独立学会发挥交替的马尔可夫游戏（AMG）。我们提出了一个新的培训框架集团的练习 - 为分散的RL代理商的人口。通过小组练习（GP），代理商在培训期间分配到多个学习组，对于游戏的每一集，代理商是随机配对和与学习组中的另一个代理商进行习惯。在GP框架下证明了最佳值函数和纳什均衡的收敛结果。通过将GP应用于Q-Learning算法和蒙特卡罗树搜索的Q-Learning算法和蒙特卡罗树搜索进行实验研究，并在Connect Four和Hex游戏中搜索。我们核实GP是赋予相同数量的培训培训方案的培训方案。我们还表明，在将本地分组应用于代理时，甚至可以提高学习效果。

著录项

来源
《International Conference on Pattern Recognition》|2021年|9234-9241|共8页
会议地点
作者
Chin-Wing Leung; Shuyue Hu; Ho-Fung Leung;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Monte Carlo methods; Sociology; Games; Reinforcement learning; Markov processes; Nash equilibrium;

机译：培训;蒙特卡罗方法;社会学;游戏;加固学习;马尔可夫进程;纳什均衡;

相似文献

外文文献
中文文献
专利

1. Self-play reinforcement learning with comprehensive critic in computer games [J] . Liu Shanqi, Cao Junjie, Wang Yujie, Neurocomputing . 2021,第Auga18期

机译：在电脑游戏中的综合评论家自助增强学习
2. Markov-game modeling of cyclist-pedestrian interactions in shared spaces: A multi-agent adversarial inverse reinforcement learning approach [J] . Alsaleh Rushdi, Sayed Tarek Transportation research . 2021,第Jula期

机译：广播空间中骑自行车者行人互动的马尔可夫 - 游戏模型
3. Multi-agent reinforcement learning method for Markov games: an approach based on the estimation of the environmental model [J] . Yasuo Nagayuki, Minoru Ito 電子情報通信学会技術研究報告. オフィスシステム . 2001,第208期

机译：马尔可夫博弈的多主体强化学习方法：一种基于环境模型估计的方法
4. Reinforcement learning in the game of Othello: Learning against a fixed opponent and learning from self-play [C] . van der Ree Michiel, Wiering Marco IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning . 2013

机译：在《奥赛罗》游戏中的强化学习：对固定对手的学习和自学的学习
5. Playing for real: Designing alternate reality games in learning contexts. [D] . Bonsignore, Elizabeth Marie. 2016

机译：真实游戏：在学习环境中设计替代现实游戏。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Approximate universal artificial intelligence and self-play learning for games [O] . Veness Joel William Computer Science Engineering Faculty of Engineering UNSW 2011

机译：近似的通用人工智能和游戏自学学习

Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅