首页> 外文期刊>Machine Learning >Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games
【24h】

Analysis of Hannan consistent selection for Monte Carlo tree search in simultaneous move games

机译:同时移动游戏中蒙特卡罗树搜索的汉南一致性选择分析。

获取原文
获取原文并翻译 | 示例

摘要

Hannan consistency, or no external regret, is a key concept for learning in games. An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the best fixed action in hindsight. If both players in a zero-sum normal form game use a Hannan consistent algorithm, their average behavior converges to a Nash equilibrium of the game. A similar result is known about extensive form games, but the played strategies need to be Hannan consistent with respect to the counterfactual values, which are often difficult to obtain. We study zero-sum extensive form games with simultaneous moves, but otherwise perfect information. These games generalize normal form games and they are a special case of extensive form games. We study whether applying HC algorithms in each decision point of these games directly to the observed payoffs leads to convergence to a Nash equilibrium. This learning process corresponds to a class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous-move games but do not have any known performance guarantees. We show that using HC algorithms directly on the observed payoffs is not sufficient to guarantee the convergence. With an additional averaging over joint actions, the convergence is guaranteed, but empirically slower. We further define an additional property of HC algorithms, which is sufficient to guarantee the convergence without the averaging and we empirically show that commonly used HC algorithms have this property.
机译:汉南语的一致性,或没有外部遗憾,是游戏学习的关键概念。如果动作选择的性能最终与事后选择最佳固定动作一样好,则该动作选择算法为Hannan一致性(HC)。如果零和范式博弈中的两个玩家都使用Hannan一致性算法,则他们的平均行为会收敛到博弈的Nash平衡。关于广泛形式的游戏,也有类似的结果,但是就反事实价值而言,所采用的策略必须是汉南一致的,而反事实价值往往很难获得。我们研究具有同步动作的零和博弈广泛形式游戏,但同时具有完善的信息。这些游戏概括了普通形式的游戏,它们是广泛形式的游戏的特例。我们研究了在这些游戏的每个决策点中将HC算法直接应用于观察到的收益是否会导致收敛至纳什均衡。该学习过程对应于一类蒙特卡洛树搜索算法,该算法在玩同时移动游戏时很流行,但没有任何已知的性能保证。我们表明,直接在观测到的收益上使用HC算法不足以保证收敛。通过对联合动作进行额外的平均,可以保证收敛,但是从经验上讲会比较慢。我们进一步定义了HC算法的附加属性,该属性足以保证收敛而无需取平均,并且我们通过经验证明常用的HC算法具有此属性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号