首页> 美国政府科技报告 >Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming

【24h】

Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming

机译：随机动态规划的样本路径最优策略的收敛性

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The authors consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, they derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associated with estimation of the value (cost-to-go) function itself. These results have practical implications for Monte Carlo simulation-based solution approaches to stochastic dynamic programming problems where it is impractical to extract the explicit transition probabilities of the underlying system model.

著录项

作者
Fu, M. C. ; Jin, X.;
展开▼
作者单位

展开▼
年度 2005
页码 1-17
总页数 17
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Optimization; Decision making; Estimates; Markov processes; Dynamic programming; Algorithms; Errors; Convergence; Monte carlo method; Policies; Computerized simulation;

机译：优化;决策;估计;马尔可夫过程;动态规划;算法;误差;收敛;蒙特卡罗方法;策略;计算机模拟;

相似文献

外文文献
中文文献
专利

1. Global convergence of a general sampling algorithm for dynamic nonlinear stochastic programs [J] . Zhiping Chen, Chengxian Xu Numerical Functional Analysis and Optimization . 2002,第5a6期

机译：动态非线性随机程序的通用采样算法的全局收敛性
2. ON THE CONVERGENCE OF SAMPLING ALGORITHMS FOR SOLVING DYNAMIC STOCHASTIC PROGRAMMING [J] . CHEN Zhiping 20f Systems Science and Mathematical Sciences . 2000,第4期

机译：动态随机规划问题的采样算法的收敛性
3. Optimality of (s, S)-policies in a dynamic pricing model with replenishment opportunities: A sample path approach [J] . Jian Li, Xiaowei Xu Operations Research Letters: A Journal of the Operations Research Society of America . 2013,第6期

机译：具有补货机会的动态定价模型中（s，S）策略的最优性：一种样本路径方法
4. STOCHASTIC DYNAMIC PROGRAMMING MODEL FOR OPTIMAL POLICIES OF A MULTI- RESERVOIR SYSTEM [C] . GANESAN SHANTHI, JOTHIPRAKASH. V International Conference on Hydroinformatics . 2007

机译：多储层系统最优策略的随机动态规划模型
5. On the convergence of model -free policy iteration algorithms for reinforcement learning: Stochastic approximation under discontinuous mean dynamics. [D] . Williams, John Kevin. 2000

机译：关于用于增强学习的无模型策略迭代算法的收敛：不连续平均动力学下的随机逼近。
6. Designing evaluation studies to optimally inform policy: what factors do policy-makers in China consider when making resource allocation decisions on healthcare worker training programmes? [O] . Shishi Wu, Helena Legido-Quigley, Julia Spencer, 2018

机译：设计评估研究以最佳地为政策提供信息：中国的决策者在制定医护人员培训计划的资源分配决策时会考虑哪些因素？
7. Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming [O] . Fu, Michael C., Jin, Xing 2005

机译：随机动态规划中样本路径最优策略的收敛性

Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅