Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

Runarsson T.P.; Lucas S.M.

首页> 外文期刊>IEEE transactions on evolutionary computation >Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

【24h】

Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

机译：协同进化与自演时差学习在小板围棋中获得位置评估

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Two learning methods for acquiring position evaluation for small Go boards are studied and compared. In each case the function to be learned is a position-weighted piece counter and only the learning method differs. The methods studied are temporal difference learning (TDL) using the self-play gradient-descent method and coevolutionary learning, using an evolution strategy. The two approaches are compared with the hope of gaining a greater insight into the problem of searching for "optimal" zero-sum game strategies. Using tuned standard setups for each algorithm, it was found that the temporal-difference method learned faster, and in most cases also achieved a higher level of play than coevolution, providing that the gradient descent step size was chosen suitably. The performance of the coevolution method was found to be sensitive to the design of the evolutionary algorithm in several respects. Given the right configuration, however, coevolution achieved a higher level of play than TDL. Self-play results in optimal play against a copy of itself. A self-play player will prefer moves from which it is unlikely to lose even when it occasionally makes random exploratory moves. An evolutionary player forced to perform exploratory moves in the same way can achieve superior strategies to those acquired through self-play alone. The reason for this is that the evolutionary player is exposed to more varied game-play, because it plays against a diverse population of players.

机译：研究并比较了两种获取小型Go板位置评估的学习方法。在每种情况下，要学习的功能都是位置加权计件器，仅学习方法有所不同。研究的方法是使用自演梯度下降法的时差学习（TDL）和使用演化策略的协同进化学习。比较了这两种方法，以期对寻找“最佳”零和博弈策略的问题有更深入的了解。使用针对每种算法的调整后的标准设置，发现时差方法的学习速度更快，并且在大多数情况下，与coevolution相比，它还达到了更高的播放度，只要适当选择了梯度下降步长即可。发现协进化方法的性能在几个方面对进化算法的设计敏感。但是，如果配置正确，协同进化的发展水平要高于TDL。自我游戏可以最佳地对抗自己的副本。一个自我玩的玩家会喜欢即使偶尔进行随机探索动作也不太可能从中丢失的动作。一个被迫以相同方式执行探索性动作的进化玩家可以实现优于仅通过自我游戏获得的策略。这样做的原因是，进化型玩家会面对更多种类的游戏，因为它会与各种各样的玩家对抗。

著录项

来源
《IEEE transactions on evolutionary computation》 |2005年第6期|p.628-640|共13页
作者
Runarsson T.P.; Lucas S.M.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
evolutionary computation; game theory; gradient methods; learning (artificial intelligence); coevolutionary learning; evolution strategy; optimal zero sum game strategy; position evaluation; position weighted piece counter; self-play gradient descent method; small;

机译：演化计算;博弈论;梯度法;学习（人工智能）;协同学习;演化策略;最优零和博弈策略;位置评估;位置加权计件;自演梯度下降法;小;

相似文献

外文文献
中文文献
专利

1. EVOLVING SMALL-BOARD GO PLAYERS USING COEVOLUTIONARY TEMPORAL DIFFERENCE LEARNING WITH ARCHIVES [J] . KRZYSZTOF KRAWIEC, WOJCIECH JASKOWSKI, MARCIN SZUBERT International Journal of Applied Mathematics and Computer Science . 2011,第4期

机译：利用档案的协同进化的时间差异学习来发展小型棋盘玩家
2. Evolving small-board Go players using coevolutionary temporal difference learning with archives [J] . Krzysztof Krawiec, Wojciech Ja?kowski, Marcin Szubert International journal of applied mathematics and computer science . 2011,第4期

机译：进化的小板围棋运动员使用带存档的协同进化时差学习
3. Self-Play and Using an Expert to Learn to Play Backgammon with Temporal Difference Learning [J] . Marco A. Wiering Journal of Intelligent Learning Systems and Applications . 2010,第2期

机译：自我玩耍和使用专家学习时差学习的五子棋游戏
4. Temporal Difference Learning Versus Co-Evolution for Acquiring Othello Position Evaluation [C] . Simon M. Lucas, Thomas P. Runarsson IEEE Symposium on Computational Intelligence and Games . 2006

机译：时间差异学习与获取奥赛罗位置评估的共同演变
5. A Quantitative Study Examining the Differences in Motivation and Achievement between Traditional versus Team-Based Learning. [D] . Ku, James Yu-Fan. 2016

机译：一项定量研究，考察了传统学习与基于团队的学习在动机和成就方面的差异。
6. Temporal weighting functions for interaural time and level differences. III. Temporal weighting for lateral position judgments [O] . G. Christopher Stecker, Jennifer D. Ostreicher, Andrew D. Brown -1

机译：时间加权功能可用于耳间时间和水平差异。三时间权重用于横向位置判断
7. Co-evolution versus Self-play Temporal Difference Learning for Acquiring Position Evaluation in Small-Board Go [O] . Thomas Philip, Runarsson Member, Simon M. Lucas Member 2015

机译：小板块围棋中获取位置评估的协同进化与自我时间差异学习

Coevolution versus self-play temporal difference learning for acquiring position evaluation in small-board go

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅