首页> 外文期刊>International Journal of Applied Mathematics and Computer Science >EVOLVING SMALL-BOARD GO PLAYERS USING COEVOLUTIONARY TEMPORAL DIFFERENCE LEARNING WITH ARCHIVES
【24h】

EVOLVING SMALL-BOARD GO PLAYERS USING COEVOLUTIONARY TEMPORAL DIFFERENCE LEARNING WITH ARCHIVES

机译:利用档案的协同进化的时间差异学习来发展小型棋盘玩家

获取原文
获取原文并翻译 | 示例

摘要

We apply Coevolutionary Temporal Difference Learning (CTDL) to learn small-board Go strategies represented as weighted piece counters. CTDL is a randomized learning technique which interweaves two search processes that operate in the intra-game and inter-game mode. Intra-game learning is driven by gradient-descent Temporal Difference Learning (TDL), a reinforcement learning method that updates the board evaluation function according to differences observed between its values for consecutively visited game states. For the inter-game learning component, we provide a coevolutionary algorithm that maintains a sample of strategies and uses the outcomes of games played between them to iteratively modify the probability distribution, according to which new strategies are generated and added to the sample. We analyze CTDL's sensitivity to all important parameters, including the trace decay constant that controls the lookahead horizon of TDL, and the relative intensity of intra-game and inter-game learning. We also investigate how the presence of memory (an archive) affects the search performance, and find out that the archived approach is superior to other techniques considered here and produces strategies that outperform a handcrafted weighted piece counter strategy and simple liberty-based heuristics. This encouraging result can be potentially generalized not only to other strategy representations used for small-board Go, but also to various games and a broader class of problems, because CTDL is generic and does not rely on any problem-specific knowledge.
机译:我们应用协同时间差异学习(CTDL)来学习表示为加权计件器的小板围棋策略。 CTDL是一种随机学习技术,将两个在游戏内和游戏间模式下运行的搜索过程交织在一起。游戏内学习由梯度下降时间差异学习(TDL)驱动,这是一种强化学习方法,可根据在连续访问的游戏状态下其值之间观察到的差异来更新棋盘评估功能。对于游戏间学习组件,我们提供了一种协同进化算法,该算法可维护策略样本,并使用它们之间进行的游戏的结果来迭代地修改概率分布,并根据该结果生成新策略并将其添加到样本中。我们分析了CTDL对所有重要参数的敏感性,包括控制TDL超前视野的轨迹衰减常数,以及游戏内和游戏间学习的相对强度。我们还调查了内存(存档)的存在如何影响搜索性能,并发现存档方法优于此处考虑的其他技术,并且所生成的策略优于手工加权计件策略和基于自由的启发式方法。由于CTDL是通用的,并且不依赖于任何特定于问题的知识,因此这一令人鼓舞的结果不仅可以推广到用于小板围棋的其他策略表示,还可以推广到各种游戏和更广泛的问题类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号