首页> 外文会议>IEEE Symposium on Computational Intelligence and Games >Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search
【24h】

Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search

机译:学习非随机动作播放奥赛罗:改善蒙特卡罗树搜索

获取原文

摘要

Monte Carlo Tree Search (MCTS) with an appropriate tree policy may be used to approximate a minimax tree for games such as Go, where a state value function cannot be formulated easily: recent MCTS algorithms successfully combine Upper Confidence Bounds for Trees with Monte Carlo (MC) simulations to incrementally refine estimates on the game-theoretic values of the game's states. Although a game-specific value function is not required for this approach, significant improvements in performance may be achieved by derandomising the MC simulations using domain-specific knowledge. However, recent results suggest that the choice of a non-uniformly random default policy is non-trivial and may often lead to unexpected outcomes. In this paper we employ Temporal Difference Learning (TDL) as a general approach to the integration of domain-specific knowledge in MCTS and subsequently study its impact on the algorithm's performance. In particular, TDL is used to learn a linear function approximator that is used as an a priori bias to the move selection in the algorithm's default policy; the function approximator is also used to bias the values of the nodes in the tree directly. The goal of this work is to determine whether such a simplistic approach can be used to improve the performance of MCTS for the well-known board game Othello. The analysis of the results highlights the broader conclusions that may be drawn with respect to non-random default policies in general.
机译:Monte Carlo树搜索(MCTS)具有适当的树策策略可用于近似游戏的最低数据树,如Go,如果无法轻易配制状态值功能:最近的MCTS算法成功地将树木与蒙特卡罗的树木上限相结合( MC)模拟以逐步细化游戏状态的游戏 - 理论值的估计。虽然这种方法不需要特定于游戏的价值函数,但是可以通过使用域特定知识来嘲弄MC模拟来实现性能的显着改进。然而,最近的结果表明,选择非统一随机默认策略是非微不足道的,可能往往导致意外结果。在本文中,我们采用时间差异学习(TDL)作为整合MCT中的域特定知识的一般方法,随后研究其对算法的性能的影响。特别地,TDL用于学习线性函数近似器,该函数近似器用作算法默认策略中移动选择的先验偏差;函数近似器还用于直接偏置树中的节点的值。这项工作的目标是确定这种简单的方法是否可用于提高MCT对众所周知的棋盘游戏奥赛罗的性能。结果分析突出了可能一般来说不随机默认政策的更广泛的结论。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号