首页> 外文期刊>工程与科学中的计算机建模(英文) >A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation
【24h】

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

机译:一种基于图的加强学习方法,具有融合状态探索和剥削

获取原文
获取原文并翻译 | 示例
       

摘要

In any classical value-based reinforcement learning method,an agent,despite of its continuous interactions with the environment,is yet unable to quickly generate a complete and independent description of the entire environment,leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks,namely exploration and exploitation.This problem becomes more pronounced when the agent has to deal with a dynamic environment,of which the configuration and/or parameters are constantly changing.In this paper,this problem is approached by first mapping a reinforcement learning scheme to a directed graph,and the set that contains all the states already explored shall continue to be exploited in the context of such a graph.We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process,and thus,there is no need to face the exploration vs.exploitation tradeoff as all the existing reinforcement learning methods do.Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment,which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper.The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes,rendering it an algorithm of choice in applications involving dynamic environments.
机译:在任何基于价值的增强学习方法中,尽管其与环境持续相互作用,但仍无法快速生成整个环境的完整和独立描述,留下学习方法与选择的困难困境难以选择在两个任务之间,即探索和开发。当代理必须处理动态环境时,问题变得更加明显,其中配置和/或参数不断变化。在本文中,首先映射A映射此问题加强学习方案到一个定向图,以及包含已经探索的所有州的集合将继续在这种图表的上下文中开发。我们证明了探索和剥削的两项任务最终会聚在决策过程中因此,由于所有现有的强化学习方法DO.RA,因此无需面对勘探VS.PROITITION权衡.RA该观察结果表明,增强学习方案基本上与动态环境中最短路径的搜索基本相同,这很容易被纸张所提出的改进的弗洛伊德 - 战争算法解决。实验结果证实了所提出的图表基于加强学习算法的性能明显高于标准Q学习算法和改进的Q-Learning算法在求解迷宫中,使其成为涉及动态环境的应用中的选择算法。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号