首页> 外文会议>American Control Conference >Empirical Value Iteration for approximate dynamic programming
【24h】

Empirical Value Iteration for approximate dynamic programming

机译:近似动态规划的经验值迭代

获取原文

摘要

We propose a simulation based algorithm, Empirical Value Iteration (EVI) algorithm, for finding the optimal value function of an MDP with infinite horizon discounted cost criteria when the transition probability kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0, we specify the minimum number of simulation samples n(ε; δ) needed in each iteration and the minimum number of iterations t(ε; δ) that are sufficient for the EVI to yield, with a probability at least 1 − δ, an approximate value function that is at least ε close to the optimal value function.
机译:我们提出了一种基于仿真的算法,即经验值迭代(EVI)算法,用于在过渡概率内核未知的情况下,使用无限期折现成本准则来查找MDP的最优值函数。与仅使用渐近收敛技术的基于仿真的算法(仅提供渐近收敛结果)不同,我们在样本复杂度结果方面给出了可证明的非渐近性能保证:给定ε> 0和δ> 0,我们指定了仿真样本的最小数量n(每次迭代所需的ε;δ)和最小迭代次数t(ε;δ),足以使EVI以至少1-δ的概率产生至少ε接近于e的近似值函数最佳值函数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号