【24h】

Concurrent MDPs with Finite Markovian Policies

机译:具有有限马尔可夫策略的并行MDP

获取原文

摘要

The recently defined class of Concurrent Markov Decision Processes (CMDPs) allows one to describe scenario based uncertainty in sequential decision problems like scheduling or admission problems. The resulting optimization problem of computing an optimal policy is NP-hard. This paper introduces a new class of policies for CMDPs on infinite horizons. A mixed integer linear program and an efficient approximation algorithm based on policy iteration are defined for the computation of optimal polices. The proposed approximation algorithm also improves the available approximate value iteration algorithm for the finite horizon . case.
机译:最近定义的一类并行马尔可夫决策过程(CMDP)允许人们描述诸如调度或准入问题之类的顺序决策问题中基于场景的不确定性。由此产生的计算最优策略的优化问题是NP难的。本文介绍了针对无限范围内的CMDP的一类新策略。定义了混合整数线性程序和基于策略迭代的有效逼近算法,用于计算最优策略。所提出的近似算法也改进了有限层的可用近似值迭代算法。案子。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号