首页> 外文期刊>Computer Modeling in Engineering & Sciences >A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation
【24h】

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

机译:一种基于图的加强学习方法,具有融合状态探索和剥削

获取原文
获取原文并翻译 | 示例
       

摘要

In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments.
机译:在任何基于价值的增强学习方法中,尽管其与环境持续相互作用,但仍无法快速生成整个环境的完整和独立的描述,留下学习方法与选择的困难困境奋斗在两项任务之间,即探索和剥削。当代理必须处理动态环境时,此问题变得更加明显,其中配置和/或参数不断变化。在本文中,通过首先将增强学习方案映射到定向图,并且包含已经探索的所有状态的集合将继续在这种图表的上下文中进行映射。我们证明,勘探和剥削的两项任务最终会聚在决策过程中,因此,由于所有现有的加强学习方法都有,无需面对勘探与剥削权衡。相反,这种观察表明,增强学习方案基本上与动态环境中的最短路径的搜索基本相同,这很容易被纸张所提出的修改的弗洛伊德战争算法解决。实验结果已经证实,所提出的基于图形的增强学习算法具有比标准Q学习算法的性能显着更高,并在求解迷宫时改进的Q学习算法,使其成为涉及动态环境的应用中的选择算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号