...
首页> 外文期刊>計測自動制御学会論文集 >Reinforcement learning based on statistical value function and its application to a board game
【24h】

Reinforcement learning based on statistical value function and its application to a board game

机译:基于统计值函数的强化学习及其在棋盘游戏中的应用

获取原文
获取原文并翻译 | 示例
           

摘要

A statistical method is proposed to cope with a large number of discrete states in a given state space in reinforcement learning. As a coarse-graining of a large number of states, less number of sets of states are defined as a group of neighboring states. State sets partly overlap each other, and one state is included in a multiple sets. The learning is based on an action-value function for each state set, and an action-value function on an individual state is derived by a statistical average of multiple value functions on state sets at the time of an action selection. The proposed method is applied to a board game Dots-and-Boxes. The state sets are defined as subspace templates of a whole board state with dots and lines, taking a geometric symmetry into the consideration. A reward is given as a number of acquired boxes minus lost boxes. Computer simulations show a successful learning through the training games competing with a mini-max method of the search depth 2 to 5, and the winning rate against a depth-3 mini-max attains about 80%. An action-value function derived by a weighted average with the weight given by the variance of rewards shows the advantage compared with an action-value function derived by a simple average.
机译:提出了一种统计方法来应对强化学习中给定状态空间中的大量离散状态。作为大量状态的粗粒度,将较少数量的状态集定义为一组相邻状态。状态集彼此部分重叠,并且一个状态包含在多个集中。该学习基于每个状态集的动作值函数,并且在选择动作时,通过状态集上多个值函数的统计平均值得出单个状态的动作值函数。所提出的方法被应用于棋盘游戏点和盒。考虑到几何对称性,将状态集定义为具有点和线的整个板状态的子空间模板。奖励是获得的盒子数减去丢失的盒子数。计算机模拟显示,通过训练游戏与2到5号搜索深度的mini-max方法竞争而获得的成功学习,而对depth-3的mini-max的获胜率约为80%。与通过简单平均值求出的动作值函数相比,通过加权平均值求出的动作值函数的权重由奖励方差给出。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号