【24h】

Stochastic Dynamic Programming with Range and Ratio Criteria

机译:具有范围和比率标准的随机动态规划

获取原文
获取原文并翻译 | 示例

摘要

In this paper we consider a finite-stage stochastic dynamic programming with range and ratio criteria. The range criterion is the maximum reward over the total stage minus minimum reward. As a ratio criterion, we take the ratio of one additive reward to the other. Our optimization problem is to minimize the expected value of the range and the ratio over a large class of policies. For each criterion of range and ratio, we use an invariant imbedding method, which introduces a family of past-value sets for reward accumulation. The imbedding expands the original state space by two dimension. First, we derive a forward recursive equation for the sequence of past-value sets. Second, we derive a backward recursive formula for sequence of optimal value functions on augmented state spaces. Finally a numerical example is illustrated for a two-state, two-action and two-stage model.
机译:在本文中,我们考虑具有范围和比率标准的有限阶段随机动态规划。范围标准是整个阶段的最大奖励减去最小奖励。作为比率标准,我们采用一种附加奖励与另一种附加奖励的比率。我们的优化问题是在大型策略中将范围的期望值和比率最小化。对于范围和比率的每个标准,我们使用不变嵌入方法,该方法引入了一组过去值集以进行奖励累积。嵌入将原始状态空间扩展了二维。首先,我们推导过去值集序列的正向递归方程。其次,我们为增强状态空间上的最优函数序列推导了一个反向递归公式。最后,给出了两个状态,两个动作和两个阶段的模型的数值示例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号