首页> 外文会议>Machine learning(ML95) >Fast and Efficient Reinforcement Learning with Truncated Temporal Differences
【24h】

Fast and Efficient Reinforcement Learning with Truncated Temporal Differences

机译:截断时间差异的快速高效强化学习

获取原文
获取原文并翻译 | 示例

摘要

The problem of temporal credit assignment in reinforcement learning is typically solved using algorithms based on the methods of temporal differences TD(#lambda#). Of those, Q-learning is currently best understood and most widely used. Using TD-based algorithms with #lambda# > 0 often allows one to speed up the propagation of credit significantly, but it involves certain implementational problems. The traditional implementation of TD(#lambda# >0) based on eligibility traces suffers from lack of generality and computational inefficiency. The TTD (Truncated Temporal Differences) procedure is a simple TD(#lambda#) approximation technique that appears to overcome these drawbacks of eligibility traces. The paper outlines this technique, discusses its computational efficiency advantages, and presents experimental studies with the combination of TTD and Q-learning in deterministic and stochastic environments. These experiments show that TTD makes it possible to obtain a significant learning speedup without reducing reliability at essentially the same computational cost as usual TD(0) learning. We conclude that the TTD procedure is probably the most promising way of using TD methods for reinforcement learning, especially for tasks with large state spaces and a hard temporal credit assignment problem.
机译:强化学习中的时间学分分配问题通常使用基于时间差异TD(#lambda#)的算法来解决。其中,Q学习是目前最好的理解和最广泛的使用。使用#lambda#> 0的基于TD的算法通常可以大大加快信用的传播,但是它涉及某些实现问题。基于资格跟踪的TD(#lambda#> 0)的传统实现方式缺乏通用性和计算效率低下。 TTD(截断的时间差异)过程是一种简单的TD(#lambda#)近似技术,它似乎克服了合格跟踪的这些缺点。本文概述了该技术,讨论了其计算效率优势,并在确定性和随机环境中结合TTD和Q学习进行了实验研究。这些实验表明,TTD可以在不降低可靠性的情况下以与通常TD(0)学习基本相同的计算成本获得显着的学习加速。我们得出结论,TTD程序可能是使用TD方法进行强化学习的最有前途的方法,尤其是对于具有较大状态空间和时间分配困难的任务的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号