首页> 外文期刊>Mathematical methods of operations research >An envelope theorem and some applications to discounted Markov decision processes
【24h】

An envelope theorem and some applications to discounted Markov decision processes

机译:包络定理及其在折扣马尔可夫决策过程中的一些应用

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided.
机译:在本文中,将针对欧几里得空间上的优化问题建立一个包络定理(ET)。通常,信封定理允许分析优化问题并通过微分技术给出解决方案。 ET将分为两个版本。其中一个使用凹度假设,而另一种则不需要这种假设。此后,建立的ET将应用于欧氏空间上打折且无限远的马尔可夫决策过程(MDP)。作为第一个应用程序,将介绍允许其确定值迭代函数的折价MDP的几个示例(包括一些经济模型)。这将允许获得相应的最优值函数和最优策略。作为ET的第二种应用,将证明在过渡律的可微性条件下,在奖励函数和系统的噪声中,问题的价值函数和最优策略在状态方面是可微的系统的。此外,将提供各种示例来说明这些可微性条件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号