首页> 外文会议>IEEE Symposium on Computational Intelligence and Games >An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning
【24h】

An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning

机译:基于胜利概率的时间差异学习的奥赛罗评价函数

获取原文

摘要

This paper presents a new reinforcement learning method, called Temporal Difference Learning with Monte Carlo simulation (TDMC), which uses a combination of Temporal Difference Learning (TD) and winning probability in each non-terminal position. Studies on self-teaching evaluation functions as applied to logic games have been conducted for many years, however few successful results of employing TD have been reported. This is perhaps due to the fact that the only reward observable in logic games is their final outcome, with no obvious rewards present in non-terminal positions. TDMC(λ) attempts to compensate this problem by introducing winning probabilities, obtained through Monte Carlo simulation, as substitute rewards. Using Othello as a testing environment, TDMC(λ), in comparison to TD(λ), has been seen to yield better learning results.
机译:本文介绍了一种新的加强学习方法,称为蒙特卡罗模拟(TDMC)的时间差异学习,其使用时间差学习(TD)的组合和每个非终端位置的赢得概率。多年来已经进行了应用于逻辑游戏的自我教学评估职能的研究,但据报道了雇用TD的成功成果。这可能是由于逻辑游戏中唯一可观察到的奖励是他们的最终结果,没有明显的奖励,在非终端位置存在。 TDMC(λ)试图通过引入通过蒙特卡罗模拟获得的获胜概率来补偿此问题,作为替代奖励。使用OTHELLO作为测试环境,与TD(λ)相比,TDMC(λ)已被视为产生更好的学习结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号