首页> 外文会议>American Control Conference >Empirical Value Iteration for approximate dynamic programming

【24h】

Empirical Value Iteration for approximate dynamic programming

机译：近似动态规划的经验值迭代

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a simulation based algorithm, Empirical Value Iteration (EVI) algorithm, for finding the optimal value function of an MDP with infinite horizon discounted cost criteria when the transition probability kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given ε > 0 and δ > 0, we specify the minimum number of simulation samples n(ε; δ) needed in each iteration and the minimum number of iterations t(ε; δ) that are sufficient for the EVI to yield, with a probability at least 1 − δ, an approximate value function that is at least ε close to the optimal value function.

机译：我们提出了一种基于仿真的算法，即经验值迭代（EVI）算法，用于在过渡概率内核未知的情况下，使用无限期折现成本准则来查找MDP的最优值函数。与仅使用渐近收敛技术的基于仿真的算法（仅提供渐近收敛结果）不同，我们在样本复杂度结果方面给出了可证明的非渐近性能保证：给定ε> 0和δ> 0，我们指定了仿真样本的最小数量n（每次迭代所需的ε;δ）和最小迭代次数t（ε;δ），足以使EVI以至少1-δ的概率产生至少ε接近于e的近似值函数最佳值函数。

著录项

来源
《American Control Conference》|2014年|495-500|共6页
会议地点
作者
Haskell William B.; Jain Rahul; Kalathil Dileep;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithm design and analysis; Approximation algorithms; Approximation methods; Convergence; Markov processes; Random variables; Learning; Markov processes; Optimization algorithms;

机译：算法设计与分析;近似算法;近似方法;收敛;马尔可夫过程;随机变量;学习;马尔可夫过程;优化算法;

相似文献

外文文献
中文文献
专利

1. A novel stable value iteration-based approximate dynamic programming algorithm for discrete-time nonlinear systems [J] . Yan-Hua Qu, An-Na Wang, Sheng Lin 中国物理：英文版 . 2018,第001期

机译：离散非线性系统基于稳定值迭代的新型近似动态规划算法
2. Novel iterative neural dynamic programming for data-based approximate optimal control design [J] . Mu Chaoxu, Wang Ding, He Haibo Automatica . 2017,第期

机译：基于数据的近似最优控制设计的新型迭代神经动态规划
3. Approximate dynamic programming via iterated Bellman inequalities [J] . Wang Yang, ODonoghue Brendan, Boyd Stephen International Journal of Robust and Nonlinear Control . 2015,第10期

机译：通过反复的Bellman不等式进行近似动态编程
4. Empirical policy iteration for approximate dynamic programming [C] . Haskell William B., Jain Rahul, Kalathil Dileep IEEE Annual Conference on Decision and Control . 2014

机译：近似动态规划的经验策略迭代
5. Stochastic Dual Dynamic Programming and Backward Approximate Dynamic Programming with Integrated Crossing State Stochastic Models for Wind Power in Energy Storage Optimization [D] . Durante, Joseph L. 2020

机译：随机双动规范和倒退近似动态规划，具有集成交叉状态随机模型的蓄能优化
6. Solving the dynamic ambulance relocation and dispatching problem using approximate dynamic programming [O] . Verena Schmid -1

机译：用近似动态规划解决动态救护车的调动和调度问题
7. Approximate dynamic programming via iterated Bellman inequalities [O] . Yang Wang, Stephen Boyd 2010

机译：通过反复的Bellman不等式进行近似动态编程

Empirical Value Iteration for approximate dynamic programming

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅