首页> 美国政府科技报告 >Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs (Revised)
【24h】

Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs (Revised)

机译:寻找马尔可夫决策链的最优政策:均值 - 方差 - 权衡的统一框架(修订)

获取原文

摘要

The paper proves constructively the existence of optimal policies for maximum one-period mean-to-standard-deviation-ratio, negative variance-with-bounded-mean and mean-penalized-by-variance Markov decision chains by reducing them to a related mathematical program. This program entails maximizing (xB/D(xb)) + C (xb) over x in a polytope and with given bounds on xb where C and D are convex and either D is constant or D is positive and nondecreasing, C is nondecreasing and xB is nonpositive. The program is in turn reduced to maximizing x(B + theta b) over x in the polytope parametrically in theta. Along the way, under the nonnegative-initial-distribution assumption, the authors generalize the rule of constructing a stationary maximum-average-reward policy from an extreme optimal solution of the associated linear program. The paper unifies and extends formulations and existence results for problems discussed by White (1987), Filar and Lee (1985), Sobel (1985), Kawai (1987) and Filar, Kallenberg and Lee (1989), and gives an effective computational procedure to solve them that is related to a method used by Kawai (1987) in a special case.

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号