Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search

机译：学习非随机动作播放奥赛罗：改善蒙特卡罗树搜索

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Monte Carlo Tree Search (MCTS) with an appropriate tree policy may be used to approximate a minimax tree for games such as Go, where a state value function cannot be formulated easily: recent MCTS algorithms successfully combine Upper Confidence Bounds for Trees with Monte Carlo (MC) simulations to incrementally refine estimates on the game-theoretic values of the game's states. Although a game-specific value function is not required for this approach, significant improvements in performance may be achieved by derandomising the MC simulations using domain-specific knowledge. However, recent results suggest that the choice of a non-uniformly random default policy is non-trivial and may often lead to unexpected outcomes. In this paper we employ Temporal Difference Learning (TDL) as a general approach to the integration of domain-specific knowledge in MCTS and subsequently study its impact on the algorithm's performance. In particular, TDL is used to learn a linear function approximator that is used as an a priori bias to the move selection in the algorithm's default policy; the function approximator is also used to bias the values of the nodes in the tree directly. The goal of this work is to determine whether such a simplistic approach can be used to improve the performance of MCTS for the well-known board game Othello. The analysis of the results highlights the broader conclusions that may be drawn with respect to non-random default policies in general.

机译：Monte Carlo树搜索（MCTS）具有适当的树策策略可用于近似游戏的最低数据树，如Go，如果无法轻易配制状态值功能：最近的MCTS算法成功地将树木与蒙特卡罗的树木上限相结合（ MC）模拟以逐步细化游戏状态的游戏 - 理论值的估计。虽然这种方法不需要特定于游戏的价值函数，但是可以通过使用域特定知识来嘲弄MC模拟来实现性能的显着改进。然而，最近的结果表明，选择非统一随机默认策略是非微不足道的，可能往往导致意外结果。在本文中，我们采用时间差异学习（TDL）作为整合MCT中的域特定知识的一般方法，随后研究其对算法的性能的影响。特别地，TDL用于学习线性函数近似器，该函数近似器用作算法默认策略中移动选择的先验偏差;函数近似器还用于直接偏置树中的节点的值。这项工作的目标是确定这种简单的方法是否可用于提高MCT对众所周知的棋盘游戏奥赛罗的性能。结果分析突出了可能一般来说不随机默认政策的更广泛的结论。

著录项

来源
《IEEE Symposium on Computational Intelligence and Games》|2011年||共8页
会议地点
作者
Robles David; Rohlfshagen Philipp; Lucas Simon M.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Learning to traverse over graphs with a Monte Carlo tree search-based self-play framework [J] . Qi Wang, Yongsheng Hao, Jie Cao Engineering Applications of Artificial Intelligence . 2021,第Octa期

机译：学习通过基于蒙特卡罗树搜索的自助框架来遍历图表
2. Adaptive playouts for online learning of policies during Monte Carlo Tree Search [J] . Graf Tobias, Platzner Marco Theoretical computer science . 2016,第Null期

机译：蒙特卡罗树搜索期间用于在线学习政策的自适应播出
3. Self-Adaptive Monte Carlo Tree Search in General Game Playing [J] . Sironi Chiara F., Liu Jialin, Winands Mark H. M. IEEE Transactions on Games . 2020,第2期

机译：自适应蒙特卡罗树搜索在普通游戏中播放
4. Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search [C] . Robles David, Rohlfshagen Philipp, Lucas Simon M. 2011 IEEE Conference on Computational Intelligence and Games . 2011

机译：学习玩奥赛罗的非随机举动：改进蒙特卡洛树搜索
5. Move Groups as a General Enhancement for Monte Carlo Tree Search. [D] . Van Eyck, Gabriel. 2014

机译：移动组作为蒙特卡洛树搜索的一项常规改进。
6. Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning [O] . Xiaoxue Wang, Yujie Qian, Hanyu Gao, 2020

机译：朝着蒙特卡罗树搜索和加固学习有效发现绿色综合途径
7. A Path Planning Method Based on Improved Single Player-Monte Carlo Tree Search [O] . Yu-Wei Xia, Chao Yang, Bing-Qiu Chen 2020

机译：一种基于改进单播放器蒙特卡罗树搜索的路径规划方法

Learning non-random moves for playing Othello: Improving Monte Carlo Tree Search

摘要

著录项

相似文献

相关主题

期刊订阅