A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

机译：蒙特卡罗反事实遗憾的快速融合方法，对不完美信息动态游戏最小化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Among existing algorithms for solving imperfect-information extensive-form games, Monte Carlo Counterfactual Regret Minimization (MCCFR) and its variants are the most popular ones. However, MCCFR suffers from slow convergence due to its high variance in estimating values. In this paper, we introduce Semi-OS, a fast-convergence method developed from Outcome-Sampling MCCF R (OS), the most popular variant of MCCFR. Semi-OS makes two novel modifications to OS. First, Semi-OS stores all histories and their values at each information set. Second, after each time we update the strategy, Semi-OS requires a full game-tree traversal to update these values. These two modifications yield a better estimation of regrets. We show that, by selecting an appropriate discount rate, Semi-OS not only significantly speeds up the convergence rate in Leduc Poker but also statistically outperforms OS in head-to-head matches of Leduc Poker, a common testbed of imperfect information games, involving 200,000 hands.

机译：在解决不完全信息的现有算法中，蒙特卡罗反事实遗工最小化（MCCFR）及其变体是最受欢迎的。然而，MCCFR由于其在估计值的高方差而受到缓慢的收敛性。在本文中，我们引入了半OS，一种从结果采样MCCF R（OS）开发的快速收敛方法，是MCCFR最受欢迎的MCCFR。 SEMI-OS对OS进行两种新颖的修改。首先，SEMI-OS将所有历史及其在每个信息集中存储它们的值。其次，每次更新策略后，SEMI-OS需要一个完整的游戏树遍历以更新这些值。这两个修改产生了更好地估计遗憾。我们展示了，通过选择适当的折扣率，半操作系统不仅显着加速LEDUC扑克的收敛速度，而且在LEDUC扑克的头部与头部比赛中统计而言，涉及不完美信息游戏的常见测试平台20万手。

著录项

来源
《IEEE Data Driven Control and Learning Systems Conference》|2020年|1048-1053|共6页
会议地点
作者
Xiaoyan Hu; Li Xia; Jun Yang; Qianchuan Zhao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
History; Games; Minimization; Monte Carlo methods; Convergence; Nash equilibrium; Trajectory;

机译：历史;游戏;最小化;蒙特卡罗方法;融合;纳什均衡;轨迹;

相似文献

外文文献
中文文献
专利

1. A Monte Carlo Neural Fictitious Self-Play approach to approximate Nash Equilibrium in imperfect-information dynamic games [J] . Li ZHANG, Yuxuan CHEN, Wei WANG, Frontiers of computer science . 2021,第5期

机译：一个蒙特卡罗神经虚拟自助式自助方法，以近期信息动态游戏近似纳什均衡
2. IMPLEMENTATION OF THE ECEPP ALGORITHM, THE MONTE CARLO MINIMIZATION METHOD, AND THE ELECTROSTATICALLY DRIVEN MONTE CARLO METHOD ON THE KENDALL SQUARE RESEARCH KSR1 COMPUTER [J] . Ripoll DR., Gibson KD., Scheraga HA., Journal of Computational Chemistry: Organic, Inorganic, Physical, Biological . 1995,第9期

机译：ECEPP算法的实现，蒙特卡罗最小化方法，以及KENDALL Square Research KSR1计算机上的静电蒙特卡罗方法
3. Ensemble Determinization in Monte Carlo Tree Search for the Imperfect Information Card Game Magic: The Gathering [J] . Cowling P. I., Ward C. D., Powley E. J. Computational Intelligence and AI in Games, IEEE Transactions on . 2012,第4期

机译：蒙特卡洛树中的集合确定性搜索不完美的信息纸牌游戏魔术：聚会
4. Online Monte Carlo Counterfactual Regret Minimization for Search in Imperfect Information Games [C] . Viliam Lisy, Marc Lanctot, Michael Bowling International Conference on Autonomous Agents and Multiagent Systems . 2015

机译：在不完美信息游戏中搜索搜索的在线蒙特卡罗反事实遗憾
5. Monte Carlo Sampling and Regret Minimization for Equilibrium Computation and Decision-Making in Large Extensive Form Games. [D] . Lanctot, Marc. 2013

机译：大型扩展形式博弈中用于均衡计算和决策制定的蒙特卡洛抽样和后悔最小化。
6. Kinked-helices model of the nicotinic acetylcholine receptor ion channel and its complexes with blockers: simulation by the Monte Carlo minimization method. [O] . D B Tikhonov, B S Zhorov 1998

机译：烟碱乙酰胆碱受体离子通道及其与阻滞剂配合物的纠缠-螺旋模型：通过蒙特卡罗最小化方法模拟。
7. Solving Imperfect-Information Games via Discounted Regret Minimization [O] . Noam Brown, Tuomas Sandholm 2019

机译：通过折扣后悔最小化解决不完美信息游戏
8. Application of Monte Carlo and Molecular Dynamics Methods to the Calculation of Thermodynamic Properties of Molten Salts [R] . Murphy, R. D. 1984

机译：蒙特卡罗和分子动力学方法在熔盐热力学性质计算中的应用

A Fast-Convergence Method of Monte Carlo Counterfactual Regret Minimization for Imperfect Information Dynamic Games

摘要

著录项

相似文献

相关主题

期刊订阅