【24h】

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

机译:具有整数权重的Markov决策过程中的部分和条件期望

获取原文

摘要

The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights, where the probability mass is redistributed to paths reaching the goal. Both variants constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), but have only been studied in structures with non-negative weights. Our main results are as follows. There are polynomial-time algorithms to check the finiteness of the supreraum of the partial or conditional expectations in MDPs with arbitrary integer weights. If finite, then optimal weight-based deterministic schedulers exist. In contrast to the setting of non-negative weights, optimal schedulers can need infinite memory and their value can be irrational. However, the optimal value can be approximated up to an absolute error of ε in time exponential in the size of the MDP and polynomial in log(1/ε).
机译:本文针对具有整数权重的马尔可夫决策过程(MDP)解决了随机最短路径问题的两种变体(“优化累积权重,直到达到目标状态”)。第一个变量优化了部分预期累积权重,其中未导致目标状态的路径被分配了权重0,而第二个变量则考虑了条件预期累积权重,其中概率质量被重新分配给了达到目标的路径。这两种变体都是对系统进行分析的有用方法,不能保证发生感兴趣的事件(达到目标状态),但是仅在具有非负权重的结构中进行了研究。我们的主要结果如下。有多项式时间算法可以检查具有任意整数权重的MDP中部分或条件期望的超然性。如果是有限的,则存在基于最优权重的确定性调度程序。与非负权重的设置相反,最佳调度程序可能需要无限内存,并且其值可能不合理。但是,最佳值可以近似于MDP大小和log(1 /ε)多项式的时间指数绝对误差ε。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号