【24h】

Finite Sample Analyses for TD(0) with Function Approximation

机译:具有功能近似的TD(0)的有限样本分析

获取原文

摘要

TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Works that managed to obtain convergence rates for online Temporal Difference (TD) methods analyzed somewhat modified versions of them that include projections and step-size dependent on unknown problem parameters. Our analysis obviates these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. Both are based on relatively unknown, recently developed stochastic approximation techniques.
机译:TD(0)是增强学习中最常用的算法之一。 尽管如此,即使对于线性情况,也没有具有功能近似的TD(0)的现有有限样本分析。 我们的工作是第一个提供此类结果的工作。 用于获得在线时间差异(TD)方法的收敛速率的作品分析了它们的一些修改版本,其中包括取决于未知问题参数的预测和阶梯大小。 我们的分析通过利用TD(0)的强大性质来消除这些人工改变。 我们提供期望和高概率的融合率。 两者都是基于相对未知的,最近开发的随机近似技术。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号