首页> 外文期刊>Applied mathematics and optimization >Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes
【24h】

Mean-Variance Problems for Finite Horizon Semi-Markov Decision Processes

机译:有限地平线半马尔可夫决策过程的均值问题

获取原文
获取原文并翻译 | 示例
           

摘要

This paper deals with a mean-variance problem for finite horizon semi-Markov decision processes. The state and action spaces are Borel spaces, while the reward function may be unbounded. The goal is to seek an optimal policy with minimal finite horizon reward variance over the set of policies with a given mean. Using the theory of -step contraction, we give a characterization of policies with a given mean and convert the second order moment of the finite horizon reward to a mean of an infinite horizon reward/cost generated by a discrete-time Markov decision processes (MDP) with a two dimension state space and a new one-step reward/cost under suitable conditions. We then establish the optimality equation and the existence of mean-variance optimal policies by employing the existing results of discrete-time MDPs. We also provide a value iteration and a policy improvement algorithms for computing the value function and mean-variance optimal policies, respectively. In addition, a linear program and the dual program are developed for solving the mean-variance problem.
机译:本文研究了有限水平半马尔可夫决策过程的均值-方差问题。状态和动作空间是Borel空间,而奖励函数可能是无界的。目标是在具有给定均值的一组策略上寻求具有最小有限水平奖励差异的最优策略。使用逐步收缩理论,我们给出了具有给定均值的策略的特征,并将有限水平奖励的二阶矩转换为由离散时间马尔可夫决策过程(MDP)生成的无限水平奖励/成本的均值)具有二维状态空间,并在合适的条件下提供了新的一步式奖励/费用。然后,我们利用离散MDP的现有结果,建立最优方程和均方差最优策略的存在。我们还提供了值迭代和策略改进算法,分别用于计算值函数和均值方差最优策略。另外,开发了线性程序和对偶程序来解决均方差问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号