首页> 外文会议>Annual conference on Neural Information Processing Systems >Reward Mapping for Transfer in Long-Lived Agents
【24h】

Reward Mapping for Transfer in Long-Lived Agents

机译:在长寿代理商转移的奖励映射

获取原文

摘要

We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a finite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent's behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Specifically, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.
机译:我们考虑如何将来自以前任务(MDP)的知识转移到当前任务中的长期和界限代理中,必须在有限寿命上解决一系列任务。我们转移方法的一个新方面是我们重用奖励函数。虽然这可能似乎是违反直观的,但我们建立了最近关于最佳奖励问题的洞察力,以指导代理人的行为与任务指定奖励函数以外的奖励功能,可以帮助克服代理的计算范围。具体而言,我们使用良好的指导奖励功能在序列中以前的任务中学到逐步训练奖励映射函数,将任务指定奖励功能映射到后续任务的良好初始指导奖励函数。我们证明,我们的方法可以大大提高代理商的性能相对于其他方法,包括转移政策的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号