首页> 美国政府科技报告 >Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming
【24h】

Convergence of Sample Path Optimal Policies for Stochastic Dynamic Programming

机译:随机动态规划的样本路径最优策略的收敛性

获取原文

摘要

The authors consider the solution of stochastic dynamic programs using sample path estimates. Applying the theory of large deviations, they derive probability error bounds associated with the convergence of the estimated optimal policy to the true optimal policy, for finite horizon problems. These bounds decay at an exponential rate, in contrast with the usual canonical (inverse) square root rate associated with estimation of the value (cost-to-go) function itself. These results have practical implications for Monte Carlo simulation-based solution approaches to stochastic dynamic programming problems where it is impractical to extract the explicit transition probabilities of the underlying system model.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号