...
首页> 外文期刊>ICGA journal >RankNet for evaluation functions of the game of Go
【24h】

RankNet for evaluation functions of the game of Go

机译:RANKNET用于游戏的评估功能

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, we present a new algorithm for learning evaluation functions of the game of Go. Recently AlphaGo Zero and AlphaZero have shown that accurate evaluation functions can be constructed by using deep neural networks. Such a training, however, requires an enormous amount of computational resources that are not available for most researchers. One of the next challenges in this domain is constructing accurate evaluation functions with lesser computational resources. To tackle this problem, we apply the RankNet algorithm to training an AlphaGo Zero style unified Policy and Value network in a learning-to-rank fashion. Using the pairwise RankNet training increases the potential number of training examples and alleviates the requirements for the number of game records. Our modified RankNet algorithm trains both value and policy losses and its joint training makes the learning stable. Experimental results showed that neural networks trained by our algorithm showed higher playing strength than other methods, especially when the dataset sizes were relatively limited.
机译:在本文中,我们提出了一种新的算法,用于了解游戏的学习评估功能。最近,alphago Zero和Alphazero表明,可以通过使用深神经网络来构建准确的评估功能。然而,这种培训需要大多数研究人员不可用的巨大计算资源。此域中的下一个挑战之一是构造具有较小的计算资源的准确评估功能。为了解决这个问题,我们将rancyNet算法应用于培训alphano零样式统一策略和价值网络,以学习 - 排名方式。使用成对校准培训培训增加了潜在的培训例子,并减轻了游戏记录数量的要求。我们改进的RankNet算法列举了价值和政策损失,其联合培训使学习稳定。实验结果表明,我们的算法训练的神经网络显示出比其他方法更高的播放强度,特别是当数据集大小相对有限时。

著录项

  • 来源
    《ICGA journal》 |2019年第2期|78-91|共14页
  • 作者

    Mandai Yusaku; Kaneko Tomoyuki;

  • 作者单位

    Univ Tokyo Tokyo Japan;

    Univ Tokyo Tokyo Japan;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号