首页> 外文期刊>Mathematics of operations research >Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min (Max) Polynomial Bellman Equations
【24h】

Polynomial Time Algorithms for Branching Markov Decision Processes and Probabilistic Min (Max) Polynomial Bellman Equations

机译:分支马尔可夫决策过程的多项式时间算法和概率分钟(MAX)多项式贝尔曼方程

获取原文
获取原文并翻译 | 示例
       

摘要

We show that one can compute the least nonnegative solution (also known as the least fixed point) for a system of probabilistic min (max) polynomial equations, to any desired accuracy epsilon > 0 in time polynomial in both the encoding size of the system and in log(1/epsilon). These are Bellman optimality equations for important classes of infinite-state Markov decision processes (MDPs), including branching MDPs (BMDPs), which generalize classic multitype branching stochastic processes. We thus obtain the first polynomial time algorithm for computing, to any desired precision, optimal (maximum and minimum) extinction probabilities for BMDPs. Our algorithms are based on a novel generalization of Newton's method, which employs linear programming in each iteration. We also provide polynomial-time (P-time) algorithms for computing an epsilon-optimal policy for both maximizing and minimizing extinction probabilities in a BMDP, whereas we note a hardness result for computing an exact optimal policy. Furthermore, improving on prior results, we provide more efficient P-time algorithms for qualitative analysis of BMDPs, that is, for determining whether the maximum or minimum extinction probability is 1, and, if so, computing a policy that achieves this. We also observe some complexity consequences of our results for branching simple stochastic games, which generalize BMDPs.
机译:我们表明,可以在系统的编码大小中计算概率最小值(MAX多项式方程的系统的最小非负解(也称为最小固定点),以时间多项式在时间多项式中的任何期望的精度epsilon> 0在log(1 / epsilon)中。这些是关于重要类别的无限状态马尔可夫决策过程(MDP)的重要类别的贝尔曼最优性方程,包括分支MDP(BMDP),其概括了经典多重分支分支随机过程。因此,我们获得了用于计算的第一多项式时间算法,以任何所需的精度,最佳(最大和最小)的BMDPS消失概率。我们的算法基于牛顿方法的新推广,在每次迭代中采用线性编程。我们还提供多项式 - 时间(P-Time)算法,用于计算BMDP中最大化和最小化灭绝概率的epsilon最佳政策,而我们注意到计算精确的最佳政策的硬度结果。此外,提高了先前结果,我们提供了更有效的P-Time算法,用于BMDP的定性分析,即确定最大或最小灭绝概率是否为1,并且如果是,则计算实现这一目标的策略。我们还遵守我们对分支简单随机游戏的结果的一些复杂性后果,这概括了BMDP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号