首页> 外文期刊>Quality Control, Transactions >Sequential Association Rule Mining for Autonomously Extracting Hierarchical Task Structures in Reinforcement Learning
【24h】

Sequential Association Rule Mining for Autonomously Extracting Hierarchical Task Structures in Reinforcement Learning

机译:序贯关联规则挖掘加强学习中的分层任务结构

获取原文
获取原文并翻译 | 示例

摘要

Reinforcement learning (RL) techniques, while often powerful, can suffer from slow learning speeds, particularly in high dimensional spaces or in environments with sparse rewards. The decomposition of tasks into a hierarchical structure holds the potential to significantly speed up learning, generalization, and transfer learning. However, the current task decomposition techniques often cannot extract hierarchical task structures without relying on high-level knowledge provided by an expert (e.g., using dynamic Bayesian networks (DBNs) in factored Markov decision processes), which is not necessarily available in autonomous systems. In this paper, we propose a novel method based on Sequential Association Rule Mining that can extract Hierarchical Structure of Tasks in Reinforcement Learning (SARM-HSTRL) in an autonomous manner for both Markov decision processes (MDPs) and factored MDPs. The proposed method leverages association rule mining to discover the causal and temporal relationships among states in different trajectories and extracts a task hierarchy that captures these relationships among sub-goals as termination conditions of different sub-tasks. We prove that the extracted hierarchical policy offers a hierarchically optimal policy in MDPs and factored MDPs. It should be noted that SARM-HSTRL extracts this hierarchical optimal policy without having dynamic Bayesian networks in scenarios with a single task trajectory and also with multiple tasks & x2019; trajectories. Furthermore, we show theoretically and empirically that the extracted hierarchical task structure is consistent with trajectories and provides the most efficient, reliable, and compact structure under appropriate assumptions. The numerical results compare the performance of the proposed SARM-HSTRL method with conventional HRL algorithms in terms of the accuracy in detecting the sub-goals, the validity of the extracted hierarchies, and the speed of learning in several testbeds. The key capabilities of SARM-HSTRL including handling multiple tasks and autonomous hierarchical task extraction can lead to the application of this HRL method in reusing, transferring, and generalization of knowledge in different domains.
机译:强化学习(RL)技术,而通常强大,可以遭受慢的学习速度,特别是在高维空间或具有稀疏奖励的环境中。任务分解成层级结构的潜力能够显着加速学习,泛化和转移学习。然而,目前的任务分解技术通常不能提取分层任务结构,而不依赖于专家提供的高级知识(例如,使用因子马尔可夫决策过程中的动态贝叶斯网络(DBN)),这不一定在自主系统中可用。在本文中,我们提出了一种基于<斜体>顺序关联规则挖掘的新方法,可以提取增强学习中的任务的分层结构(<斜体> sarm-hstrl )以马尔可夫决策过程(MDP)和因子MDP为自主方式。该方法利用关联规则挖掘来发现不同轨迹中的状态之间的因果关系和时间关系,并提取一个任务层次结构,该任务层次结构将子目标之间的这些关系捕获到不同子任务的终止条件。我们证明了提取的分层策略在MDP和因子MDP中提供了分层最佳策略。应该注意的是,<斜体> SARM-HSTRL 提取该分层最优策略,而不在方案中具有动态贝叶斯网络,其中单个任务轨迹以及多个任务和X2019;轨迹。此外,我们理论上和经验地显示提取的分层任务结构与轨迹一致,并在适当的假设下提供最有效,可靠和紧凑的结构。数值结果比较了在检测子目标的准确性,提取的层次结构的准确性方面,在传统的HRL算法中与传统的HRL算法的性能进行比较,以及提取的层次结构的有效性以及几个学习的速度试验台。 SARM-HSTRL 包括处理多个任务和自主分层任务提取的关键功能可能导致该HRL方法在重用,转移和泛化不同域中的知识中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号