首页> 外文会议>Annual conference on Neural Information Processing Systems >Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions
【24h】

Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions

机译:有效的蒙特卡罗反事实遗憾最小化与许多球员行动的游戏中

获取原文

摘要

Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player's actions according to the player's average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff.
机译:反事实遗憾最小化(CFR)是一种流行的迭代算法,用于在广泛的游戏中计算策略。 Monte Carlo CFR(MCCFR)变体通过穿过树的较小,采样部分来减少CFR的迭代时间成本。由于它们对给定播放器的每个动作进行了许多玩家操作,以前的最有效的MCCFR实例仍然非常缓慢。在本文中,我们提出了一种新的MCCFR算法,平均战略采样(AS),其根据玩家的平均策略来对玩家的动作的子集进行采样。我们的新算法通过CFR融合到给定的解决方案质量所需的迭代次数的新的算法启发。此外,我们证明了类似,更严格的界限和其他流行的MCCFR变体。最后,我们通过证明这是比以前的MCCFR算法更快地收敛到无限制扑克和虚张声势中的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号