首页> 外文会议>AAAI Symposium on Lifelong Machine Learning >Lifelong Learning of Structure in the Space of Policies
【24h】

Lifelong Learning of Structure in the Space of Policies

机译:终身学习在政策空间中的结构

获取原文

摘要

We address the problem faced by an autonomous agent that must achieve quick responses to a family of qualitatively-related tasks, such as a robot interacting with different types of human participants. We work in the setting where the tasks share a state-action space and have the same qualitative objective but differ in the dynamics and reward process. We adopt a transfer approach where the agent attempts to exploit common structure in learnt policies to accelerate learning in a new one. Our technique consists of a few key steps. First, we use a probabilistic model to describe the regions in state space which successful trajectories seem to prefer. Then, we extract policy fragments from previously-learnt policies for these regions as candidates for reuse. These fragments may be treated as options with corresponding domains and termination conditions extracted by unsupervised learning. Then, the set of reusable policies is used when learning novel tasks, and the process repeats. The utility of this method is demonstrated through experiments in the simulated soccer domain, where the variability comes from the different possible behaviours of opponent teams, and the agent needs to perform well against novel opponents.
机译:我们解决了一个自治代理面临的问题,必须为一个与定性相关的任务的家庭进行快速响应,例如与不同类型的人类参与者交互的机器人。我们在任务中共享国家行动空间的环境中工作,并且具有相同的定性目标,但在动态和奖励过程中有所不同。我们采用转移方法,代理商试图利用学习政策的共同结构,以加速新的策略。我们的技术包括一些关键步骤。首先,我们使用概率模型来描述成功轨迹似乎更喜欢的状态空间中的区域。然后,我们将策略片段从先前学习的这些区域中提取策略片段作为重用的候选者。这些片段可以被视为具有相应域的选项和由无监督学习提取的相应域和终止条件。然后,在学习新型任务时使用该组可重用策略,并重复该过程。通过模拟的足球域中的实验证明了该方法的实用性,其中可变异来自对手团队的不同可能性,而代理需要对新的对手进行良好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号