首页> 外文期刊>Journal of Advanced Computatioanl Intelligence and Intelligent Informatics >Strategy Acquisition for Games Based on Simplified Reinforcement Learning Using a Strategy Network
【24h】

Strategy Acquisition for Games Based on Simplified Reinforcement Learning Using a Strategy Network

机译:基于策略网络的简化强化学习的游戏策略获取

获取原文
获取原文并翻译 | 示例
           

摘要

We propose a simplified form of reinforcement learning (RL) for game strategy acquisition using a strategy network. RL has been applied to a number of games, such as backgammon, checkers, etc. However, the application of RL to Othello or Shogi, which have very large state spaces, is more difficult because these games take a very long time to learning. The proposed strategy network is composed of N lines from N nodes on the game board with a single evaluation node as a 2-layer perceptron. These nodes denote all possible states of every square on the game board and can easily represent the evaluation function. Moreover, these nodes can also denote imaginary states, such as pieces that may exist in the next step, or denote every positional relation of two arbitrary pieces or other various board phases. After several thousands of games had been played, the strategy network quickly acquired a better evaluation function than that using a normalized Gaussian network. The computer player employing the strategy network beat a heuristic-based player that evaluates the values of pieces or places on the game board. The proposed strategy network was able to acquire good weightings of various features of game states. In addition, the player employing the strategy network for a 4×4 Othello task after co-evolutionary training acquired a winning strategy.
机译:我们提出一种简化形式的强化学习(RL),用于使用策略网络进行游戏策略获取。 RL已应用于许多游戏,例如步步高,跳棋等。但是,将RL应用于具有很大状态空间的Othello或Shogi则更加困难,因为这些游戏需要花费很长时间学习。拟议的策略网络由游戏板上N个节点的N条线组成,其中一个评估节点作为2层感知器。这些节点表示游戏板上每个方块的所有可能状态,并且可以轻松表示评估功能。此外,这些节点还可以表示虚拟状态,例如可能存在于下一步中的零件,或者表示两个任意零件或其他各种板相的每个位置关系。在玩了数千场游戏之后,与使用归一化的高斯网络相比,策略网络迅速获得了更好的评估功能。使用策略网络的计算机玩家击败了基于启发式的玩家,该玩家评估游戏板上的棋子或位置的值。所提出的策略网络能够获得游戏状态各种特征的良好权重。另外,在经过共同进化训练后将策略网络用于4×4奥赛罗任务的玩家获得了获胜策略。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号