...
首页> 外文期刊>ICGA journal >RankNet for evaluation functions of the game of Go
【24h】

RankNet for evaluation functions of the game of Go

机译:RankNet用于Go游戏的评估功能

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a new algorithm for learning evaluation functions of the game of Go. Recently AlphaGo Zero and AlphaZero have shown that accurate evaluation functions can be constructed by using deep neural networks. Such a training, however, requires an enormous amount of computational resources that are not available for most researchers. One of the next challenges in this domain is constructing accurate evaluation functions with lesser computational resources. To tackle this problem, we apply the RankNet algorithm to training an AlphaGo Zero style unified Policy and Value network in a learning-to-rank fashion. Using the pairwise RankNet training increases the potential number of training examples and alleviates the requirements for the number of game records. Our modified RankNet algorithm trains both value and policy losses and its joint training makes the learning stable. Experimental results showed that neural networks trained by our algorithm showed higher playing strength than other methods, especially when the dataset sizes were relatively limited.
机译:在本文中,我们提出了一种用于学习围棋游戏评估功能的新算法。最近,AlphaGo Zero和AlphaZero显示可以通过使用深度神经网络来构建准确的评估功能。但是,这种培训需要大量的计算资源,而这对于大多数研究人员而言是不可用的。该领域的下一个挑战是用较少的计算资源来构建准确的评估功能。为了解决这个问题,我们采用RankNet算法以按等级学习的方式训练AlphaGo零样式统一策略和价值网络。使用成对的RankNet训练可以增加训练示例的数量,并减轻对游戏记录数量的要求。我们改进的RankNet算法同时训练了价值损失和政策损失,其联合训练使学习稳定。实验结果表明,我们的算法训练的神经网络表现出比其他方法更高的播放强度,尤其是在数据集大小相对有限的情况下。

著录项

  • 来源
    《ICGA journal》 |2019年第2期|78-91|共14页
  • 作者

    Mandai Yusaku; Kaneko Tomoyuki;

  • 作者单位

    Univ Tokyo, Tokyo, Japan;

    Univ Tokyo, Tokyo, Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号