An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning

机译：基于胜利概率的时间差异学习的奥赛罗评价函数

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents a new reinforcement learning method, called Temporal Difference Learning with Monte Carlo simulation (TDMC), which uses a combination of Temporal Difference Learning (TD) and winning probability in each non-terminal position. Studies on self-teaching evaluation functions as applied to logic games have been conducted for many years, however few successful results of employing TD have been reported. This is perhaps due to the fact that the only reward observable in logic games is their final outcome, with no obvious rewards present in non-terminal positions. TDMC(λ) attempts to compensate this problem by introducing winning probabilities, obtained through Monte Carlo simulation, as substitute rewards. Using Othello as a testing environment, TDMC(λ), in comparison to TD(λ), has been seen to yield better learning results.

机译：本文介绍了一种新的加强学习方法，称为蒙特卡罗模拟（TDMC）的时间差异学习，其使用时间差学习（TD）的组合和每个非终端位置的赢得概率。多年来已经进行了应用于逻辑游戏的自我教学评估职能的研究，但据报道了雇用TD的成功成果。这可能是由于逻辑游戏中唯一可观察到的奖励是他们的最终结果，没有明显的奖励，在非终端位置存在。 TDMC（λ）试图通过引入通过蒙特卡罗模拟获得的获胜概率来补偿此问题，作为替代奖励。使用OTHELLO作为测试环境，与TD（λ）相比，TDMC（λ）已被视为产生更好的学习结果。

著录项

来源
《IEEE Symposium on Computational Intelligence and Games》|2008年||共7页
会议地点
作者
Yasuhiro OSAKI; Kazutomo SHIBAHARA; Yasuhiro TAJIMA; Yoshiyuki KOTANI;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Preference Learning for Move Prediction and Evaluation Function Approximation in Othello [J] . Runarsson T.P., Lucas S.M. Computational Intelligence and AI in Games, IEEE Transactions on . 2014,第3期

机译：Othello中的移动预测和评估函数逼近的偏好学习
2. The potential of temporal analysis: Combining log data and lag sequential analysis to investigate temporal differences between scaffolded and non-scaffolded group inquiry-based learning processes [J] . Lamsa Joni, Hamalainen Raija, Koskinen Pekka, Computers & education . 2020,第Jana期

机译：时间分析的潜力：将日志数据和滞后顺序分析相结合，以研究基于支架和非支架的小组探究式学习过程之间的时间差异
3. Do praying mantises have multiple visual channels for spatial and temporal frequency? Preliminary factor analytic study of individual differences in optomotor-based contrast sensitivity functions [J] . Peterzell David H., Nityananda Vivek, Tarawneh Ghaith, Perception . 2016,第3期

机译：螳螂是否具有用于时空频率的多个视觉通道？基于光动力的对比敏感度函数中个体差异的初步因素分析研究
4. An Othello evaluation function based on Temporal Difference Learning using probability of winning [C] . Osaki Y., Shibahara K., Tajima Y., Computational Intelligence and Games, 2008. CIG '08 . 2008

机译：基于获胜概率的时间差异学习的奥赛罗评估功能
5. Examining Professional Learning to Support Educators' Design, Implementation, and Evaluation of Functional Assessment-Based Interventions. [D] . Common, Eric Alan. 2017

机译：审查专业学习以支持教育工作者基于功能评估的干预措施的设计，实施和评估。
6. MRI-based cerebrovascular reactivity using transfer function analysis reveals temporal group differences between patients with sickle cell disease and healthy controls [O] . Jackie Leung, James Duffin, Joseph A. Fisher, 2016

机译：使用传递函数分析的基于MRI的脑血管反应性揭示了镰状细胞病患者与健康对照组之间的时间组差异
7. Temporal difference learning versus co-evolution for acquiring othello position evaluation [O] . Simon M. Lucas, Thomas P. Runarsson 2006

机译：获得othello位置评估的时间差异学习与共同进化

An Othello Evaluation Function Based on Temporal Difference Learning using Probability of Winning

摘要

著录项

相似文献

相关主题

期刊订阅