【24h】

Bootstrapping from Game Tree Search

机译:通过游戏树搜索进行引导

获取原文

摘要

In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuel's checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function. After initialising its weight vector to small random values, Meep was able to learn high quality weights from self-play alone. When tested online against human opponents, Meep played at a master level, the best performance of any chess program with a heuristic learned entirely from self-play.
机译:在本文中,我们介绍了一种新的算法,可通过朝着由alpha-beta搜索计算出的值更新启发式来更新启发式评估函数的参数。我们的算法在两个关键方面不同于以前的搜索学习方法,例如Samuel的跳棋播放器和TD-Leaf算法。首先,我们更新搜索树中的所有节点,而不是单个节点。其次,我们使用深度搜索的结果而不是后续搜索的结果作为评估功能的训练信号。我们使用线性启发式函数在国际象棋程序Meep中实现了我们的算法。将其权重向量初始化为较小的随机值后,Meep能够仅通过自我演奏就学习高质量的权重。在与人类对手进行在线测试时,米普以大师级水平进行比赛,这是所有象棋程序中表现最佳的一种,其启发式方法完全是从自我游戏中学到的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号