RevCuT Tree Search Method in Complex Single-player Game with Continuous Search Space

机译：具有连续搜索空间的复杂单人游戏中的RevCuT树搜索方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Monte-Carlo Tree Search (MCTS) has achieved great success in combinatorial game, which has the characteristics of finite action state space, deterministic state transition and sparse reward. AlphaGo Zero combined MCTS and deep neural networks defeated the world champion Lee Sedol in the Go game, proving the advantages of tree search in combinatorial game with enormous search space. However, when the search space is continuous and even with chance factors, tree search methods like UCT failed. Because each state will be visited repeatedly with probability zero and the information in tree will never be used, that is to say UCT algorithm degrades to Monte Carlo rollouts. Meanwhile, the previous exploration experiences cannot be used to correct the next tree search process, and makes a huge increase in the demand for computing resources. To solve this kind of problem, this paper proposes a step-by-step Reverse Curriculum Learning with Truncated Tree Search method (RevCuT Tree Search). In order to retain the previous exploration experiences, we use the deep neural network to learn the state-action values at explored states and then guide the next tree search process. Besides, taking the computing resources into consideration, we establish a truncated search tree focusing on continuous state space rather than the whole trajectory. This method can effectively reduce the number of explorations and achieve the effect beyond the human level in our well designed single-player game with continuous state space and probabilistic state transition.

机译：蒙特卡罗树搜索（MCTS）在组合博弈中取得了巨大的成功，它具有有限动作状态空间，确定性状态转换和稀疏奖励的特点。 AlphaGo Zero零组件结合了MCTS和深度神经网络，在Go游戏中击败了世界冠军Lee Sedol，证明了组合游戏中树搜索的优势以及巨大的搜索空间。但是，当搜索空间是连续的并且即使有偶然因素时，UCT之类的树搜索方法也会失败。因为将以零概率重复访问每个状态，并且永远不会使用树中的信息，也就是说，UCT算法降级为“蒙特卡洛”部署。同时，先前的探索经验不能用于校正下一棵树的搜索过程，并且极大地增加了对计算资源的需求。为了解决此类问题，本文提出了一种采用截断树搜索方法（RevCuT树搜索）的分步式反向课程学习方法。为了保留以前的探索经验，我们使用深度神经网络来学习处于探索状态的状态动作值，然后指导下一个树的搜索过程。此外，考虑到计算资源，我们建立了一个截断的搜索树，重点放在连续的状态空间而不是整个轨迹上。这种方法可以有效地减少探索次数，并在精心设计的具有连续状态空间和概率状态转换的单人游戏中达到超出人类水平的效果。

著录项

来源
《International Joint Conference on Neural Networks》|2019年|1-8|共8页
会议地点
作者
Hongming Zhang; Fangjuan Cheng; Bo Xu; Feng Chen; Jiachen Liu; Wei Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A single-player Monte Carlo tree search method combined with node importance for virtual network embedding [J] . Zheng Guangcong, Wang Cong, Shao Weijie, Annals of telecommunications . 2021,第5a6期

机译：单人蒙特卡罗树搜索方法结合虚拟网络嵌入的节点重要性
2. A GA based method for search-space reduction of chess game-tree [J] . Dehghani Hootan, Babamir Seyed Morteza Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2017,第3期

机译：基于GA基于Chess游戏树的搜索空间减少方法
3. Evaluation of Game Tree Search Methods by Game Records [J] . Takeuchi S., Kaneko T., Yamaguchi K. Computational Intelligence and AI in Games, IEEE Transactions on . 2010,第4期

机译：通过游戏记录评估游戏树搜索方法
4. RevCuT Tree Search Method in Complex Single-player Game with Continuous Search Space [C] . Hongming Zhang, Fangjuan Cheng, Bo Xu, International Joint Conference on Neural Networks . 2019

机译：Revcut树搜索方法在复杂的单人游戏中，具有连续搜索空间
5. Resource constraint cooperative game with Monte Carlo Tree Search. [D] . Cheng, Chee Chian. 2016

机译：使用蒙特卡洛树搜索进行资源约束合作博弈。
6. Detecting Epistatic SNPs Associated with Complex Diseases via a Bayesian Classification Tree Search Method [O] . Min Chen, Judy Cho, Hongyu Zhao -1

机译：通过贝叶斯分类树搜索方法检测与复杂疾病相关的伪生疾病
7. Rolling horizon evolution versus tree search for navigation in single-player real-time games [O] . Diego Perez, Spyridon Samothrakis, Simon Lucas, 2013

机译：滚动地平线演进与树搜索单人实时游戏中的导航

RevCuT Tree Search Method in Complex Single-player Game with Continuous Search Space

摘要

著录项

相似文献

相关主题

期刊订阅