首页> 外文会议>Canadian Conference on Artificial Intelligence >Options in Multi-task Reinforcement Learning - Transfer via Reflection
【24h】

Options in Multi-task Reinforcement Learning - Transfer via Reflection

机译:多任务强化学习中的选项-通过反思进行转移

获取原文

摘要

Temporally extended actions such as options are known to lead to improvements in reinforcement learning (RL). At the same time, transfer learning across different RL tasks is an increasingly active area of research. Following Baxter's formalism for transfer, the corresponding RL question considers the benefit that an RL agent can achieve on new tasks based on experience from previous tasks in a common 'learning environment'. We address this in the specific context of goal-based multitask RL, where the different tasks correspond to different goal states within a common state space, and we introduce Landmark Options Via Reflection (LOVR), a flexible framework that uses options to transfer domain knowledge. As an explicit analog of principles in transfer learning, we provide theoretical and empirical results demonstrating that when a set of landmark states covers the state space suitably, then a LOVR agent that learns optimal value functions for these in an initial phase and deploys the associated optimal policies as options in the main phase, can achieve a drastic reduction in cumulative regret compared to baseline approaches.
机译:众所周知,诸如选项之类的临时扩展动作会导致强化学习(RL)的改善。同时,跨不同RL任务的转移学习是一个日益活跃的研究领域。遵循百特的转学形式主义,相应的RL问题考虑了RL代理可以基于常见“学习环境”中以前任务的经验来完成新任务的收益。我们在基于目标的多任务RL的特定上下文中解决此问题,在该特定上下文中,不同的任务对应于一个公共状态空间中的不同目标状态,并且我们引入了“通过反射进行地标选择”(LOVR),该框架使用选项来转移领域知识。作为转移学习中的原理的明确模拟,我们提供理论和经验结果,证明当一组界标状态适当覆盖状态空间时,LOVR代理会在初始阶段为这些学习最佳值函数并部署相关的最优值。与基准方法相比,将政策作为主要阶段的选择可以大大减少累积遗憾。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号