【24h】

Learning to Achieve Goals

机译:学习实现目标

获取原文

摘要

Temporal difference methods solve the temporal credit assignment problem for reinforcement learning. An important subproblem of general reinforcement learning is learning to achieve dynamic goals. Although existing temporal difference methods, such as Q learning, can be applied to this problem, they do not take advantage of its special structure. This paper presents the DG-learning algorithm, which learns efficiently to achieve dynamically changing goals and exhibits good knowledge transfer between goals. In addition, this paper shows how traditional relaxation techniques can be applied to the problem. Finally, experimental results are given that demonstrate the superiority of DG learning over Q learning in a moderately large, synthetic, non-deterministic domain.
机译:时间差异方法解决了加强学习的时间信用分配问题。一般强化学习的一个重要子问题正在学习实现动态目标。虽然现有的时间差异方法(例如Q学习)可以应用于这个问题,但它们不利用其特殊结构。本文提出了DG学习算法,其有效地学习,以实现动态变化的目标,并在目标之间表现出良好的知识转移。此外,本文展示了传统的松弛技术如何应用​​于问题。最后,给出了实验结果,以展示在中等大,合成的非确定性域中学习的DG学习的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号