首页> 外文学位 >Subgoal discovery for hierarchical reinforcement learning using learned policies.
【24h】

Subgoal discovery for hierarchical reinforcement learning using learned policies.

机译:使用目标策略进行分层强化学习的子目标发现。

获取原文
获取原文并翻译 | 示例

摘要

Reinforcement learning has proven to be an effective method for creating intelligent agents in a wide range of applications. However, it suffers from the need for a large number of training episodes, a problem that is especially noticeable in large domains. Although the utility of hierarchy is commonly accepted, there has been relatively little research on autonomously discovering or creating useful hierarchies. A system is desirable that can scale reinforcement learning to complex real-world tasks and autonomously discover hierarchical structures within their learning and control systems.; This thesis introduces a method that allows a reinforcement learning agent to autonomously discover and create hierarchy from a learned policy model. A hierarchy of actions helps to create an abstraction which is an encapsulation of a set of actions into a single higher level action that allows an agent to learn while ignoring details that appear at finer levels. The main idea is to find subgoals in a learned policy model by searching for states that exhibit certain structural properties. These subgoals are used to create hierarchies of actions. The hierarchies of actions help the agent to explore more effectively and accelerate learning in other tasks in the same or similar environments where the same subgoals are useful. It is demonstrated that the hierarchical action sequences created with autonomously discovered subgoals can facilitate learning and enable effective knowledge transfer to related tasks.
机译:事实证明,强化学习是在各种应用中创建智能代理的有效方法。但是,它需要大量的训练集,这是一个在大型领域中尤其明显的问题。尽管层次结构的效用已被普遍接受,但是关于自主发现或创建有用的层次结构的研究相对较少。期望有一种系统,其可以将强化学习扩展到复杂的现实世界任务,并在其学习和控制系统内自主发现层次结构。本文介绍了一种方法,该方法允许强化学习代理从学习的策略模型中自主发现并创建层次结构。动作的层次结构有助于创建抽象,该抽象是将一组动作封装到单个更高级别的动作中,该动作使代理能够在学习的同时忽略出现在更精细级别的细节。主要思想是通过搜索表现出某些结构属性的状态来在学习的策略模型中找到子目标。这些子目标用于创建操作的层次结构。动作层次结构可帮助代理更有效地探索并加速在使用相同子目标的相同或相似环境中的其他任务中的学习。结果表明,使用自主发现的子目标创建的分层操作序列可以促进学习,并使有效的知识转移到相关任务。

著录项

  • 作者

    Goel, Sandeep Kumar.;

  • 作者单位

    The University of Texas at Arlington.;

  • 授予单位 The University of Texas at Arlington.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2003
  • 页码 51 p.
  • 总页数 51
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号