首页> 美国政府科技报告 >Using Strong Convergence to Accelerate Value Iteration.
【24h】

Using Strong Convergence to Accelerate Value Iteration.

机译:利用强收敛加速价值迭代。

获取原文

摘要

Convergence of the relative value function (total value function less total value of a base state) and the optimal policy in length of the horizon for value iteration (markov decision programming) have been recently shown to be geometric with factor alpha beta. Here alpha is the discount factor, and beta < or = 1.0. The case beta < 1.0 is termed 'strong convergence'. It is suggested in this paper that bounds on the convergence rate be estimated computationally during the value iteration process, giving bounds directly on the extrapolated infinite horizon relative value function. Such an extrapolation has two purposes. First, large numbers of iterations in value iteration can be skipped by continuing computation directly with the estimated infinite horizon relative value function (this is directly analogous to quadratic acceleration procedures in non-linear programming). Second, existing procedures for the elimination of non-optimal actions are greatly strengthened, since actions can be eliminated permanently once bounds on the infinite horizon relative value function become sufficiently tight.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号