首页> 外文会议>IEEE Data Driven Control and Learning Systems Conference >A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games
【24h】

A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

机译:蒙特卡罗反事实遗憾的快速融合方法,对不完美信息动态游戏最小化

获取原文

摘要

Among existing algorithms for solving imperfect-information extensive-form games, Monte Carlo Counterfactual Regret Minimization (MCCFR) and its variants are the most popular ones. However, MCCFR suffers from slow convergence due to its high variance in estimating values. In this paper, we introduce Semi-OS, a fast-convergence method developed from Outcome-Sampling MCCF R (OS), the most popular variant of MCCFR. Semi-OS makes two novel modifications to OS. First, Semi-OS stores all histories and their values at each information set. Second, after each time we update the strategy, Semi-OS requires a full game-tree traversal to update these values. These two modifications yield a better estimation of regrets. We show that, by selecting an appropriate discount rate, Semi-OS not only significantly speeds up the convergence rate in Leduc Poker but also statistically outperforms OS in head-to-head matches of Leduc Poker, a common testbed of imperfect information games, involving 200,000 hands.
机译:在解决不完全信息的现有算法中,蒙特卡罗反事实遗工最小化(MCCFR)及其变体是最受欢迎的。然而,MCCFR由于其在估计值的高方差而受到缓慢的收敛性。在本文中,我们引入了半OS,一种从结果采样MCCF R(OS)开发的快速收敛方法,是MCCFR最受欢迎的MCCFR。 SEMI-OS对OS进行两种新颖的修改。首先,SEMI-OS将所有历史及其在每个信息集中存储它们的值。其次,每次更新策略后,SEMI-OS需要一个完整的游戏树遍历以更新这些值。这两个修改产生了更好地估计遗憾。我们展示了,通过选择适当的折扣率,半操作系统不仅显着加速LEDUC扑克的收敛速度,而且在LEDUC扑克的头部与头部比赛中统计而言,涉及不完美信息游戏的常见测试平台20万手。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号