An envelope theorem and some applications to discounted Markov decision processes

Cruz-Suarez H; Montes-de-Oca R

首页> 外文期刊>Mathematical methods of operations research >An envelope theorem and some applications to discounted Markov decision processes

【24h】

An envelope theorem and some applications to discounted Markov decision processes

机译：包络定理及其在折扣马尔可夫决策过程中的一些应用

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, an Envelope Theorem (ET) will be established for optimization problems on Euclidean spaces. In general, the Envelope Theorems permit analyzing an optimization problem and giving the solution by means of differentiability techniques. The ET will be presented in two versions. One of them uses concavity assumptions, whereas the other one does not require such kind of assumptions. Thereafter, the ET established will be applied to the Markov Decision Processes (MDPs) on Euclidean spaces, discounted and with infinite horizon. As the first application, several examples (including some economic models) of discounted MDPs for which the et allows to determine the value iteration functions will be presented. This will permit to obtain the corresponding optimal value functions and the optimal policies. As the second application of the ET, it will be proved that under differentiability conditions in the transition law, in the reward function, and the noise of the system, the value function and the optimal policy of the problem are differentiable with respect to the state of the system. Besides, various examples to illustrate these differentiability conditions will be provided.

机译：在本文中，将针对欧几里得空间上的优化问题建立一个包络定理（ET）。通常，信封定理允许分析优化问题并通过微分技术给出解决方案。 ET将分为两个版本。其中一个使用凹度假设，而另一种则不需要这种假设。此后，建立的ET将应用于欧氏空间上打折且无限远的马尔可夫决策过程（MDP）。作为第一个应用程序，将介绍允许其确定值迭代函数的折价MDP的几个示例（包括一些经济模型）。这将允许获得相应的最优值函数和最优策略。作为ET的第二种应用，将证明在过渡律的可微性条件下，在奖励函数和系统的噪声中，问题的价值函数和最优策略在状态方面是可微的系统的。此外，将提供各种示例来说明这些可微性条件。

著录项

来源
《Mathematical methods of operations research》 |2008年第2期|共23页
作者
Cruz-Suarez H; Montes-de-Oca R;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类数学;
关键词
envelope theorem; discounted Markov decision process; differentiability of the optimal value function; differentiability of the optimal policy; economic growth model; ECONOMIC-GROWTH; UNCERTAINTY; DYNAMICS; PLANS;

机译：包络定理;折扣马尔可夫决策过程;最优价值函数的可微性;最优政策的可微性;经济增长模型;经济增长;不确定性;动态性;计划;

相似文献

外文文献
中文文献
专利

1. An envelope theorem and some applications to discounted Markov decision processes [J] . Hugo Cruz-Suárez, Raúl Montes-de-Oca Mathematical Methods of Operations Research . 2008,第2期

机译：包络定理及其在折扣马尔可夫决策过程中的一些应用
2. AN UNBOUNDED BERGE'S MINIMUM THEOREM WITH APPLICATIONS TO DISCOUNTED MARKOV DECISION PROCESSES [J] . Raul Montes-de-Oca, Enrique Lemus-Rodriguez Kybernetika . 2012,第2期

机译：无边界的BERGE最小定理及其在马尔可夫决策过程中的应用
3. An unbounded Berge's minimum theorem with applications to discounted Markov decision processes [J] . Lemus-Rodríguez Enrique, Montes-de-Oca Raúl Kybernetika . 2012,第2期

机译：无穷Berge最小定理及其在折现Markov决策过程中的应用
4. An application to the finite approximation of the first passage models for discrete-time Markov decision processes with varying discount factors [C] . Xiao Wu, Junyu Zhang World Congress on Intelligent Control and Automation . 2014

机译：可变折扣因子的离散时间马尔可夫决策过程在第一阶段模型有限逼近中的应用
5. ERGODICITY AND FUNCTIONAL CENTRAL LIMIT THEOREMS FOR A CLASS OF MARKOV PROCESSES WITH APPLICATIONS TO NONLINEAR AUTOREGRESSIVE MODELS (INVARIANT, PROBABILITY). [D] . LEE, OESOOK. 1986

机译：一类马尔可夫过程的紧缩性和功能中心极限定理，适用于非线性自动回归模型（不变性，概率）。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Fixed point theorems for discounted finite markov decision processes [O] . Holzbaur Ulrich Dieter 1986

机译：折扣有限马尔可夫决策过程的不动点定理

An envelope theorem and some applications to discounted Markov decision processes

摘要

著录项

相似文献

相关主题

期刊订阅