Reinforcement learning based on statistical value function and its application to a board game

Ikuko Nishikawa; Tomoyuki Nakanishi

首页> 外文期刊>計測自動制御学会論文集 >Reinforcement learning based on statistical value function and its application to a board game

【24h】

Reinforcement learning based on statistical value function and its application to a board game

机译：基于统计价值函数的加固学习及其在棋盘游戏中的应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A statistical method is proposed to cope with a large number of discrete states in a given state space in reinforcement learning. As a coarse-graining of a large number of states, less number of sets of states are defined as a group of neighboring states. State sets partly overlap each other, and one state is included in a multiple sets. The learning is based on an action-value function for each state set, and an action-value function on an individual state is derived by a statistical average of multiple value functions on state sets at the time of an action selection. The proposed method is applied to a board game Dots-and-Boxes. The state sets are defined as subspace templates of a whole board state with dots and lines, taking a geometric symmetry into the consideration. A reward is given as a number of acquired boxes minus lost boxes. Computer simulations show a successful learning through the training games competing with a mini-max method of the search depth 2 to 5, and the winning rate against a depth-3 mini-max attains about 80%. An action-value function derived by a weighted average with the weight given by the variance of rewards shows the advantage compared with an action-value function derived by a simple average.

机译：提出统计方法以应对加强学习中给定的状态空间中的大量离散状态。作为大量状态的粗谷，少量的状态被定义为一组邻国。状态集彼此部分地重叠，多个集合包含一个状态。该学习基于每个状态集的动作值函数，并且各个状态上的动作值函数通过在动作选择时的状态集上的多个值函数的统计平均值导出。该方法应用于棋盘游戏点和箱。状态集被定义为具有点和线的整体板状态的子空间模板，以考虑几何对称性。作为许多获取的盒子减去丢失框，给出了奖励。计算机模拟通过培训游戏竞争与搜索深度2至5的迷你最大方法竞争成功，赢得了深度3迷你最大值的获胜率约为80％。由加权平均值导出的动作值函数，其重量通过奖励方差给出的权重，与通过简单平均的动作值函数相比显示了优势。

著录项

来源
《計測自動制御学会論文集》 |2003年第7期|共9页
作者
Ikuko Nishikawa; Tomoyuki Nakanishi;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 jpn
中图分类自动化元件、部件;
关键词
Reinforcement learning; Action-value function; State set; Weighted average by a variance; Dots-and-boxes game;

机译：加强学习;动作 - 价值函数;状态集;方差的加权平均;点和箱游戏;

相似文献

外文文献
中文文献
专利

1. Reinforcement learning based on statistical value function and its application to a board game [J] . Ikuko Nishikawa, Tomoyuki Nakanishi 計測自動制御学会論文集 . 2003,第7期

机译：基于统计值函数的强化学习及其在棋盘游戏中的应用
2. Reinforcement Renaissance The power of deep neural networks has sparked renewed interest in reinforcement learning, with applications to games, robotics, and beyond [J] . Krakovsky Marina Communications of the ACM . 2016,第8期

机译：强化文艺复兴深度神经网络的力量激发了人们对强化学习及其在游戏，机器人技术及其他领域的应用的新兴趣。
3. Game Theory-Based Control System Algorithms with Real-Time Reinforcement Learning: How to Solve Multiplayer Games Online [J] . Kyriakos G. Vamvoudakis, Hamidreza Modares, Bahare Kiumarsi, Control Systems, IEEE . 2017,第1期

机译：实时强化学习的基于博弈论的控制系统算法：如何在线解决多人游戏
4. Reinforcement learning based on a statistical value function and its application to a board game [C] . Nishikawa, I., Nakanishi, . 2002

机译：基于统计值函数的强化学习及其在棋盘游戏中的应用
5. Stable Profiles in Simulation-Based Games via Reinforcement Learning and Statistics [D] . ?Wright, Mason 2019

机译：通过强化学习和统计基于模拟游戏稳定型材
6. The Bloody Board Game: A Game-Based Approach for Learning High-Value Care Principles in the Setting of Anemia Diagnosis [O] . Thomas John Pisano, Valeria Santibanez, Marcelo Hernandez, 2020

机译：血腥的棋盘游戏：一种基于游戏的方法用于学习贫血诊断的高价值护理原理
7. Investigation into the effect of social learning in reinforcement learning board game playing agents [O] . Marivate Vukosi Ntsakisi 2009

机译：社会学习在强化学习型棋盘游戏代理商中的作用调查

Reinforcement learning based on statistical value function and its application to a board game

摘要

著录项

相似文献

相关主题

期刊订阅