Off-policy Learning With Eligibility Traces: A Survey

Matthieu Geist; Bruno Scherrer

首页> 外文期刊>Journal of machine learning research >Off-policy Learning With Eligibility Traces: A Survey

【24h】

Off-policy Learning With Eligibility Traces: A Survey

机译：非政策性学习与资格追踪：一项调查

获取原文

开具论文收录证明 >>

AI期刊论文写作 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In the framework of Markov Decision Processes, we considerlinear off-policy learning, that is the problem oflearning a linear approximation of the value function of somefixed policy from one trajectory possibly generated by someother policy. We briefly review on-policy learningalgorithms of the literature (gradient-based and least-squares-based), adopting a unified algorithmic view. Then, we highlighta systematic approach for adapting them to off-policylearning with eligibility traces. This leads to someknown algorithms---off-policy LSTD($lambda$), LSPE($lambda$),TD($lambda$), TDC/GQ($lambda$)---and suggests new extensions---off-policy FPKF($lambda$), BRM($lambda$), gBRM($lambda$),GTD2($lambda$). We describe a comprehensive algorithmicderivation of all algorithms in a recursive and memory-efficentform, discuss their known convergence properties and illustratetheir relative empirical behavior on Garnet problems. Ourexperiments suggest that the most standard algorithms on andoff-policy LSTD($lambda$)/LSPE($lambda$)---and TD($lambda$)if the feature space dimension is too large for a least-squaresapproach---perform the best. color="gray">

机译：在马尔可夫决策过程的框架中，我们考虑线性的偏离政策学习，这是从某个其他策略可能生成的轨迹中学习某个固定策略的价值函数的线性逼近的问题。我们采用统一的算法观点，简要回顾了文献中的基于策略的学习算法（基于梯度和最小二乘法）。然后，我们重点介绍了一种系统的方法，使他们适应带有资格痕迹的脱离政策学习。这导致了一些已知算法-非策略性LSTD（$ lambda $），LSPE（$ lambda $），TD（$ lambda $），TDC / GQ（$ lambda $）-并建议新的扩展-政策外的FPKF（$ lambda $），BRM（$ lambda $），gBRM（$ lambda $），GTD2（$ lambda $）。我们以递归和记忆有效形式描述了所有算法的综合算法派生，讨论了它们的已知收敛性质，并说明了它们在石榴石问题上的相对经验行为。我们的实验表明，如果特征空间尺寸对于最小二乘方方法而言过大，则基于开和关策略LSTD（$ lambda $）/ LSPE（$ lambda $）-和TD（$ lambda $）的最标准算法表现最好。 color =“ gray”>

著录项

来源
《Journal of machine learning research》 |2014年第11期|共45页
作者
Matthieu Geist; Bruno Scherrer;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence [J] . Milo? S. Stankovi?, Marko Beko, Srdjan S. Stankovi? IFAC PapersOnLine . 2020,第2期

机译：分布式梯度时间差异偏离策略学习与资格痕迹：弱收敛
2. An off-policy least square algorithms with eligibility trace based on importance reweighting [J] . Haifei Zhang, Ying Hong, Jianlin Qiu Cluster computing . 2017,第4期

机译：基于重要性重新重量的资格迹线的禁止策略最小二乘算法
3. Adaptive Fuzzy Watkins: A New Adaptive Approach for Eligibility Traces in Reinforcement Learning [J] . Shokri Matin, Khasteh Seyed Hossein, Aminifar Amin International Journal of Fuzzy Systems . 2019,第5期

机译：自适应模糊沃特金斯：强化学习中资格跟踪的一种新的自适应方法
4. Eligibility traces for off-policy evaluation [C] . Doina Precup, Richard S. Sutton, Satinder Singh International conference on machine learning . 2000

机译：资格措施违规评估
5. Features and Performance of Sarsa Reinforcement Learning Algorithm with Eligibility Traces and Local Environment Analysis for Bots in First Person Shooter Games [D] . Bundik Bettina Vivien 2020

机译：第一人称射击游戏中具有资格追踪和局部环境分析的Sarsa强化学习算法的功能和性能
6. One-shot learning and behavioral eligibility traces in sequential decision making [O] . Marco P Lehmann, He A Xu, Vasiliki Liakoni, 2019

机译：连续决策中的一次射击学习和行为资格痕迹
7. Off-policy Learning with Eligibility Traces: A Survey [O] . Geist Matthieu, Scherrer Bruno 2014

机译：非政策性学习与资格追踪：一项调查

Off-policy Learning With Eligibility Traces: A Survey

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅