首页> 外文会议>IEEE Conference on Computational Intelligence and Games >Improved LinUCT and its evaluation on incremental random-feature tree
【24h】

Improved LinUCT and its evaluation on incremental random-feature tree

机译:改进的LinUCT及其对增量随机特征树的评估

获取原文

摘要

UCT is a standard method of Monte Carlo tree search (MCTS) algorithms, which have been applied to various domains and have achieved remarkable success. This study proposes a family of Leaf-LinUCT, which are improved LinUCT algorithms incorporating LinUCB into MCTS. LinUCB outperforms UCB1 in contextual multi-armed bandit problems, owing to a kind of online learning with ridge regression. However, due to the minimax structure of game trees, ridge regression in LinUCB does not always work well in the context of tree search. In this paper, we remedy the problem and extend our previous work on LinUCT in two ways: (1) by restricting teacher data for regression to the frontier nodes in a current search tree, and (2) by adjusting the feature vector of each internal node to the weighted mean of the feature vector of the descendant nodes. We also present a new synthetic model, incremental-random-feature tree, by extending the standard incremental random tree model. In our model, each node has a feature vector that represents the characteristics of the corresponding position. The elements of a feature vector in a node are randomly changed from those in its parent node by each move, as the heuristic score of a node is randomly changed by each move in the standard incremental random tree model. The experimental results show that our Leaf-LinUCT outperformed UCT and existing LinUCT algorithms, in the incremental-random-feature tree and a synthetic game studied in [1].
机译:UCT是蒙特卡罗树搜索(MCTS)算法的标准方法,已应用于各个领域并取得了显著成功。这项研究提出了Leaf-LinUCT系列,这是将LinUCB集成到MCTS中的改进的LinUCT算法。 LinUCB在上下文多臂强盗问题上的表现优于UCB1,这是由于具有岭回归的一种在线学习。但是,由于游戏树的极大极小结构,LinUCB中的岭回归在树搜索的上下文中并不总是能很好地起作用。在本文中,我们对问题进行了补救,并以两种方式扩展了我们在LinUCT上的工作:(1)通过将教师数据限制为回归到当前搜索树中的前沿节点,以及(2)通过调整每个内部的特征向量节点到后代节点的特征向量的加权平均值。通过扩展标准的增量随机树模型,我们还提出了一种新的综合模型,即增量随机特征树。在我们的模型中,每个节点都有一个代表相应位置特征的特征向量。节点中特征向量的元素通过每次移动而与其父节点中的特征向量的元素随机变化,因为在标准增量随机树模型中,节点的启发式分数通过每次移动随机地变化。实验结果表明,在增量随机特征树和合成博弈中,我们的Leaf-LinUCT性能优于UCT和现有的LinUCT算法[1]。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号