首页> 外文会议>International Conference on Pattern Recognition >Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System
【24h】

Self-Play or Group Practice: Learning to Play Alternating Markov Game in Multi-Agent System

机译:自助或团体练习:学习在多代理系统中发挥交替的马尔可夫游戏

获取原文
获取外文期刊封面目录资料

摘要

The research in reinforcement learning has achieved great success in strategic game playing. These successes are thanks to the incorporation of deep reinforcement learning (DRL) and Monte Carlo Tree Search (MCTS) to the agent trained under the self-play (SP) environment. By self-play, agents are provided with an incrementally more difficult curriculum which in turn facilitates learning. However, recent research suggests that agents trained via self-play may easily lead to getting stuck in local equilibria. In this paper, we consider a population of agents each independently learns to play an alternating Markov game (AMG). We propose a new training framework-group practice- for a population of decentralized RL agents. By group practice (GP), agents are assigned into multiple learning groups during training, for every episode of games, an agent is randomly paired up and practices with another agent in the learning group. The convergence result to the optimal value function and the Nash equilibrium are proved under the GP framework. Experimental study is conducted by applying GP to Q-learning algorithm and the deep Q-learning with Monte-Carlo tree search on the game of Connect Four and the game of Hex. We verify that GP is the more efficient training scheme than SP given the same amount of training. We also show that the learning effectiveness can even be improved when applying local grouping to agents.
机译:加固学习的研究取得了巨大成功的战略游戏。这些成功符合深度加强学习(DRL)和蒙特卡罗树搜索(MCTS)的纳入自拍(SP)环境培训的代理。通过自我扮演,提供代理商,逐步提供更加困难的课程,其又有助于学习。然而,最近的研究表明,通过自我游戏训练的代理可能很容易导致陷入局部均衡。在本文中,我们考虑一群代理商,每个代理商都独立学会发挥交替的马尔可夫游戏(AMG)。我们提出了一个新的培训框架集团的练习 - 为分散的RL代理商的人口。通过小组练习(GP),代理商在培训期间分配到多个学习组,对于游戏的每一集,代理商是随机配对和与学习组中的另一个代理商进行习惯。在GP框架下证明了最佳值函数和纳什均衡的收敛结果。通过将GP应用于Q-Learning算法和蒙特卡罗树搜索的Q-Learning算法和蒙特卡罗树搜索进行实验研究,并在Connect Four和Hex游戏中搜索。我们核实GP是赋予相同数量的培训培训方案的培训方案。我们还表明,在将本地分组应用于代理时,甚至可以提高学习效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号