A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

Li Han; Chen Tianding; Teng Hualiang; Jiang Yingtao

首页> 外文期刊>Computer Modeling in Engineering & Sciences >A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

【24h】

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

机译：一种基于图的加强学习方法，具有融合状态探索和剥削

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In any classical value-based reinforcement learning method, an agent, despite of its continuous interactions with the environment, is yet unable to quickly generate a complete and independent description of the entire environment, leaving the learning method to struggle with a difficult dilemma of choosing between the two tasks, namely exploration and exploitation. This problem becomes more pronounced when the agent has to deal with a dynamic environment, of which the configuration and/or parameters are constantly changing. In this paper, this problem is approached by first mapping a reinforcement learning scheme to a directed graph, and the set that contains all the states already explored shall continue to be exploited in the context of such a graph. We have proved that the two tasks of exploration and exploitation eventually converge in the decision-making process, and thus, there is no need to face the exploration vs. exploitation tradeoff as all the existing reinforcement learning methods do. Rather this observation indicates that a reinforcement learning scheme is essentially the same as searching for the shortest path in a dynamic environment, which is readily tackled by a modified Floyd-Warshall algorithm as proposed in the paper. The experimental results have confirmed that the proposed graph-based reinforcement learning algorithm has significantly higher performance than both standard Q-learning algorithm and improved Q-learning algorithm in solving mazes, rendering it an algorithm of choice in applications involving dynamic environments.

机译：在任何基于价值的增强学习方法中，尽管其与环境持续相互作用，但仍无法快速生成整个环境的完整和独立的描述，留下学习方法与选择的困难困境奋斗在两项任务之间，即探索和剥削。当代理必须处理动态环境时，此问题变得更加明显，其中配置和/或参数不断变化。在本文中，通过首先将增强学习方案映射到定向图，并且包含已经探索的所有状态的集合将继续在这种图表的上下文中进行映射。我们证明，勘探和剥削的两项任务最终会聚在决策过程中，因此，由于所有现有的加强学习方法都有，无需面对勘探与剥削权衡。相反，这种观察表明，增强学习方案基本上与动态环境中的最短路径的搜索基本相同，这很容易被纸张所提出的修改的弗洛伊德战争算法解决。实验结果已经证实，所提出的基于图形的增强学习算法具有比标准Q学习算法的性能显着更高，并在求解迷宫时改进的Q学习算法，使其成为涉及动态环境的应用中的选择算法。

著录项

来源
《Computer Modeling in Engineering & Sciences》 |2019年第2期|共25页
作者
Li Han; Chen Tianding; Teng Hualiang; Jiang Yingtao;
展开▼
作者单位

Wenzhou Univ Coll Math Phys &

Elect Informat Engn Wenzhou 325035 Zhejiang Peoples R China;

Minnan Normal Univ Sch Phys &

Informat Engn Zhangzhou 363000 Fujian Peoples R China;

Univ Nevada Howard R Hughes Coll Engn Dept Civil &

Environm Engn Las Vegas NV 89154 USA;

Univ Nevada Howard R Hughes Coll Engn Dept Elect &

Comp Engn Las Vegas NV 89154 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Reinforcement learning; graph; exploration and exploitation; maze;

机译：加强学习;图;探索和剥削;迷宫;
入库时间 2022-08-20 00:01:43

相似文献

外文文献
中文文献
专利

1. A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation [J] . Li Han, Chen Tianding, Teng Hualiang, Computer Modeling in Engineering & Sciences . 2019,第2期

机译：一种基于图的加强学习方法，具有融合状态探索和剥削
2. Learning Exploration/Exploitation Strategies for Single Trajectory Reinforcement Learning [J] . Damien Ernst, Francis Maes, Michael Castronovo, JMLR: Workshop and Conference Proceedings . 2012,第2012期

机译：单轨强化学习的学习探索/开发策略
3. Controller exploitation-exploration reinforcement learning architecture for computing near-optimal policies [J] . Asiain Erick, Clempner Julio B., Poznyak Alexander S. Soft computing: A fusion of foundations, methodologies and applications . 2019,第11期

机译：控制器利用探索勘探加固学习架构，用于计算近最优策略
4. Reinforcement Learning for Multiple HAPS/UAV Coordination: Impact of Exploration-Exploitation Dilemma on Convergence [C] . Ogbonnaya Anicho, Philip B. Charlesworth, Gurvinder S. Baicher, International Conference on Soft Computing for Problem Solving . 2020

机译：多哈普/无人机协调的加固学习：勘探剥削困境对收敛的影响
5. Exploiting external/domain knowledge to enhance traditional text mining using graph-based methods. [D] . Zhang, Xiaodan. 2009

机译：利用基于图的方法，利用外部/领域知识来增强传统的文本挖掘。
6. Confidence modulates exploration and exploitation in value-based learning [O] . Annika Boldt, Charles Blundell, Benedetto De Martino 2019

机译：信心调节基于价值的学习中的探索和开发
7. Meta-learning of exploration-exploitation strategies in reinforcement learning [O] . Ernst, Damien 2013

机译：强化学习中探索与开发策略的元学习

A Graph-Based Reinforcement Learning Method with Converged State Exploration and Exploitation

摘要

著录项

相似文献

相关主题

期刊订阅