Reconciling λ-Returns with Experience Replay

机译：重新调整λ-返回的经验重播

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Modern deep reinforcement learning methods have departed from the incremental learning required for eligibility traces, rendering the implementation of the λ-return difficult in this context. In particular, off-policy methods that utilize experience replay remain problematic because their random sampling of minibatches is not conducive to the efficient calculation of λ-returns. Yet replay-based methods are often the most sample efficient, and incorporating λ-returns into them is a viable way to achieve new state-of-the-art performance. Towards this, we propose the first method to enable practical use of λ-returns in arbitrary replay-based methods without relying on other forms of decorrelation such as asynchronous gradient updates. By promoting short sequences of past transitions into a small cache within the replay memory, adjacent λ-returns can be efficiently precomputed by sharing Q-values. Computation is not wasted on experiences that are never sampled, and stored λ-returns behave as stable temporal-difference (TD) targets that replace the target network. Additionally, our method grants the unique ability to observe TD errors prior to sampling; for the first time, transitions can be prioritized by their true significance rather than by a proxy to it. Furthermore, we propose the novel use of the TD error to dynamically select λ-values that facilitate faster learning. We show that these innovations can enhance the performance of DQN when playing Atari 2600 games, even under partial observability. While our work specifically focuses on λ-returns, these ideas are applicable to any multi-step return estimator.

机译：现代化的深层加强学习方法已经离开了资格迹线所需的增量学习，在此上下文中渲染λ返回的实现。特别是，利用体验重放的违规方法仍然存在问题，因为它们的随机抽样不利于有效计算λ返回的计算。然而，基于重播的方法通常是最具样本的高效，并将λ返回的返回结合到它们是实现新的最先进性能的可行方法。为此，我们提出了第一种方法来实现任意重播的方法中的λ返回的实际使用，而不依赖于其他形式的去相关性等异步梯度更新。通过将过去过渡的短序列推广到重放存储器内的小缓存中，可以通过共享Q值有效地预先计算相邻的λ返回。在从未采样的经验上不浪费计算，存储λ返回的表现为替换目标网络的稳定时间差（Td）目标。此外，我们的方法授予在采样之前观察TD误差的独特能力;首次，转换可以通过其真正的意义而不是通过它来优先考虑。此外，我们提出了新颖的使用TD误差来动态地选择λ-值，这些值促进更快地学习。我们表明，这些创新可以在atari 2600游戏时提高DQN的表现，即使在部分可观察性下也是如此。虽然我们的工作专门关注λ返回，但这些想法适用于任何多步骤返回估计。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p794-1589|共10页
会议地点
作者
Brett Daley; Christopher Amato;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Skewness in Stock Returns:Reconciling the Evidence on Firm Versus Aggregate Returns [J] . Rui Albuquerque The review of financial studies . 2012,第5期

机译：股票收益率的偏度：调和公司对总收益的证据
2. Efficient transient testing procedure using a novel experience replay particle swarm optimizer for THD-based robust design and optimization of self-X sensory electronics in industry?4.0 [J] . Zaman Qummar, Alraho Senan, K?nig Andreas Journal of Sensors and Sensor Systems . 2021,第2期

机译：高效的瞬态测试程序使用新颖体验重播粒子群优化器进行基于THD的强大设计和优化行业的自我x感官电子产品？4.0
3. A divided and prioritized experience replay approach for streaming regression [J] . Mikkel Leite Arn?, John-Morten Godhavn, Ole Morten Aamo MethodsX . 2021,第a期

机译：用于流媒体回归的分割和优先考虑体验重播方法
4. Reconciling λ-Returns with Experience Replay [C] . Brett Daley, Christopher Amato Conference on Neural Information Processing Systems . 2020

机译：重新调整λ-返回的经验重播
5. Entropy-Based Experience Replay in Reinforcement Learning [D] . Dadvar, Mehdi. 2020

机译：基于熵的体验重播在加固学习中
6. Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor–Critic with Hindsight Experience Replay [O] . Evan Prianto, MyeongSeop Kim, Jae-Han Park, 2020

机译：使用深度加强学习的多臂操纵器的路径规划：软演员 - 与后敏感体验重播
7. What Has Been Happening to UK Income Inequality since the Mid-1990s? Answers from Reconciled and Combined Household Survey and Tax Return Data [O] . Burkhauser Richard V., Herault Nicolas, Jenkins Stephen P., 2016

机译：自20世纪90年代中期以来，英国收入不平等发生了什么变化？来自和解和联合住户调查和纳税申报数据的答案
8. Replay Technique: The Concept, Initial Experience and Experience and Proposed Developments [R] . Jackson, A., Onslow, G. J. 1985

机译：重播技术：概念，初步经验和经验以及拟议的发展

Reconciling λ-Returns with Experience Replay

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅