首页> 美国政府科技报告 >Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs (Revised)

【24h】

Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs (Revised)

机译：寻找马尔可夫决策链的最优政策：均值 - 方差 - 权衡的统一框架（修订）

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The paper proves constructively the existence of optimal policies for maximum one-period mean-to-standard-deviation-ratio, negative variance-with-bounded-mean and mean-penalized-by-variance Markov decision chains by reducing them to a related mathematical program. This program entails maximizing (xB/D(xb)) + C (xb) over x in a polytope and with given bounds on xb where C and D are convex and either D is constant or D is positive and nondecreasing, C is nondecreasing and xB is nonpositive. The program is in turn reduced to maximizing x(B + theta b) over x in the polytope parametrically in theta. Along the way, under the nonnegative-initial-distribution assumption, the authors generalize the rule of constructing a stationary maximum-average-reward policy from an extreme optimal solution of the associated linear program. The paper unifies and extends formulations and existence results for problems discussed by White (1987), Filar and Lee (1985), Sobel (1985), Kawai (1987) and Filar, Kallenberg and Lee (1989), and gives an effective computational procedure to solve them that is related to a method used by Kawai (1987) in a special case.

著录项

作者
Huang, Y.; Kallenberg, L. C. M.;
展开▼
作者单位

展开▼
年度 1993
页码 1-25
总页数 25
原文格式 PDF
正文语种 eng
中图分类工业技术;
关键词
Linear programming ; Decision theory ; Markov chains ; Optimization ; Algorithms ; Theorems;

机译：线性规划;决策理论;马尔可夫链;优化;算法;定理;

相似文献

外文文献
中文文献
专利

1. Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards [J] . Arie Hordijk, Alexander A. Yushkevich Mathematical methods of operations research . 1999,第3期

机译：具有Borel状态空间和无穷大奖励的Markov决策链中所有策略类别的Blackwell最优性
2. Polynomial-Time Computation of Strong and n-Present-Value Optimal Policies in Markov Decision Chains [J] . OSullivan Michael, Veinott Arthur F. Jr. Mathematics of operations research . 2017,第3期

机译：马尔可夫决策链中强度和N实值最佳政策的多项式计算
3. SAMPLE-PATH OPTIMAL STATIONARY POLICIES IN STABLE MARKOV DECISION CHAINS WITH THE AVERAGE REWARD CRITERION [J] . Cavazos-Cadena Rolando, Montes-De-Oca Raul, Sladky Karel Journal of Applied Probability . 2015,第2期

机译：带有平均奖励标准的稳定马尔可夫决策链中的样本路径最优平稳策略
4. Sensitivity Analysis for the Optimal Minimal Repair/Replacement Policies under the Framework of Markov Decision Process [C] . Mingchih Chen, Chun-Yuan Cheng IEEE International Conference on Industrial Engineering and Engineering Management . 2007

机译：马尔可夫决策过程框架下最佳最小修复/替换政策的敏感性分析
5. International financial networks and global supply chains: A unified framework for decision-making, optimization, and risk management. [D] . Cruz, Jose M. 2004

机译：国际金融网络和全球供应链：决策，优化和风险管理的统一框架。
6. Optimal Information Collection Policies in a Markov Decision Process Framework [O] . Lauren E. Cipriano, Jeremy D. Goldhaber-Fiebert, Shan Liu, -1

机译：马尔可夫决策过程框架中的最佳信息收集策略
7. Average, sensitive and Blackwell-optimal policies in denumerable Markov decision chains with unbounded rewards [O] . Dekker, R. (Rommert), Hordijk, A. (Arie) 1988

机译：具有无穷回报的可数马尔可夫决策链中的平均，敏感和布莱克韦尔最优策略
8. Finding Optimal Policies for Markov Decision Chains: A Unifying Framework forMean-Variance-Tradeoffs [R] . Huang, Y., Kallenberg, L. C. M. 1990

机译：寻找马尔可夫决策链的最优政策：一个统一的方差 - 权衡 - 权衡

Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs (Revised)

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅