首页> 外文会议>Systems, Man and Cybernetics (SMC), 2008 IEEE International Conference on >Learning task decomposition and exploration shaping for reinforcement learning agents
【24h】

Learning task decomposition and exploration shaping for reinforcement learning agents

机译:强化学习主体的学习任务分解和探索塑造

获取原文

摘要

For situated reinforcement learning agents to succeed in complex real world environments they have to be able to efficiently acquire and reuse control knowledge in order to accomplish new tasks faster and to accelerate the learning of new policies. While hierarchical learning approaches which transfer previously acquired skills and representations to model and control new tasks have the potential to significantly improve learning times, they also pose the risk of “behavior proliferation” where the growing set of available actions makes it increasingly difficult to determine a strategy for a new task. To overcome this problem and to further improve knowledge reuse, the learning agent should thus also have the ability to predict the utility of an action or reusable skill in a new context and to analyze new tasks in order to decompose them into known subtasks. This paper presents a novel approach for learning task decomposition by learning to predict the utility of subgoals and subgoal types in the context of a new task, as well as for exploration shaping by predicting the likelihood with which each available action is useful in the given task context. This information, encoded as a set of utility functions, is then used to focus the exploration and learning process of the agent to increase performance both in terms of the time spent to reach the new task''s goal the first time and of the time required to learn an optimal policy. This ability is demonstrated here in the context of navigation and manipulation tasks in a feature enhanced grid world domain.
机译:为了使位于现场的强化学习代理在复杂的现实世界环境中获得成功,他们必须能够有效地获取和重用控制知识,以便更快地完成新任务并加速对新策略的学习。虽然转移以前获得的技能和表示来建模和控制新任务的分层学习方法有可能显着改善学习时间,但它们也带来了“行为扩散”的风险,因为越来越多的可用操作使得确定一个新的学习方法变得越来越困难。新任务的策略。为了克服这个问题并进一步改善知识的重用性,学习代理还应具有在新的上下文中预测动作或可重用技能的效用并分析新任务以将其分解为已知子任务的能力。本文提出了一种新的方法,用于通过学习预测新任务环境中子目标和子目标类型的效用来学习任务分解,以及通过预测每个可用动作在给定任务中有用的可能性来进行探索成形语境。然后,将这些信息编码为一组实用程序功能,将其用于重点关注代理的探索和学习过程,以提高在第一次达到新任务目标所花费的时间和时间上的性能。需要学习最佳政策。此功能在功能增强的网格世界域中的导航和操纵任务的上下文中得到了证明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号