首页> 外文期刊>IEEE transactions on evolutionary computation >Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go
【24h】

Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

机译:协同进化与自演时差学习在小板围棋中获得位置评估

获取原文
获取原文并翻译 | 示例

摘要

Two learning methods for acquiring position evaluation for small Go boards are studied and compared. In each case the function to be learned is a position-weighted piece counter and only the learning method differs. The methods studied are temporal difference learning (TDL) using the self-play gradient-descent method and coevolutionary learning, using an evolution strategy. The two approaches are compared with the hope of gaining a greater insight into the problem of searching for "optimal" zero-sum game strategies. Using tuned standard setups for each algorithm, it was found that the temporal-difference method learned faster, and in most cases also achieved a higher level of play than coevolution, providing that the gradient descent step size was chosen suitably. The performance of the coevolution method was found to be sensitive to the design of the evolutionary algorithm in several respects. Given the right configuration, however, coevolution achieved a higher level of play than TDL. Self-play results in optimal play against a copy of itself. A self-play player will prefer moves from which it is unlikely to lose even when it occasionally makes random exploratory moves. An evolutionary player forced to perform exploratory moves in the same way can achieve superior strategies to those acquired through self-play alone. The reason for this is that the evolutionary player is exposed to more varied game-play, because it plays against a diverse population of players.
机译:研究并比较了两种获取小型Go板位置评估的学习方法。在每种情况下,要学习的功能都是位置加权计件器,仅学习方法有所不同。研究的方法是使用自演梯度下降法的时差学习(TDL)和使用演化策略的协同进化学习。比较了这两种方法,以期对寻找“最佳”零和博弈策略的问题有更深入的了解。使用针对每种算法的调整后的标准设置,发现时差方法的学习速度更快,并且在大多数情况下,与coevolution相比,它还达到了更高的播放度,只要适当选择了梯度下降步长即可。发现协进化方法的性能在几个方面对进化算法的设计敏感。但是,如果配置正确,协同进化的发展水平要高于TDL。自我游戏可以最佳地对抗自己的副本。一个自我玩的玩家会喜欢即使偶尔进行随机探索动作也不太可能从中丢失的动作。一个被迫以相同方式执行探索性动作的进化玩家可以实现优于仅通过自我游戏获得的策略。这样做的原因是,进化型玩家会面对更多种类的游戏,因为它会与各种各样的玩家对抗。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号