Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

机译：具有整数权重的Markov决策过程中的部分和条件期望

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The paper addresses two variants of the stochastic shortest path problem ('optimize the accumulated weight until reaching a goal state') in Markov decision processes (MDPs) with integer weights. The first variant optimizes partial expected accumulated weights, where paths not leading to a goal state are assigned weight 0, while the second variant considers conditional expected accumulated weights, where the probability mass is redistributed to paths reaching the goal. Both variants constitute useful approaches to the analysis of systems without guarantees on the occurrence of an event of interest (reaching a goal state), but have only been studied in structures with non-negative weights. Our main results are as follows. There are polynomial-time algorithms to check the finiteness of the supreraum of the partial or conditional expectations in MDPs with arbitrary integer weights. If finite, then optimal weight-based deterministic schedulers exist. In contrast to the setting of non-negative weights, optimal schedulers can need infinite memory and their value can be irrational. However, the optimal value can be approximated up to an absolute error of ε in time exponential in the size of the MDP and polynomial in log(1/ε).

机译：本文针对具有整数权重的马尔可夫决策过程（MDP）解决了随机最短路径问题的两种变体（“优化累积权重，直到达到目标状态”）。第一个变量优化了部分预期累积权重，其中未导致目标状态的路径被分配了权重0，而第二个变量则考虑了条件预期累积权重，其中概率质量被重新分配给了达到目标的路径。这两种变体都是对系统进行分析的有用方法，不能保证发生感兴趣的事件（达到目标状态），但是仅在具有非负权重的结构中进行了研究。我们的主要结果如下。有多项式时间算法可以检查具有任意整数权重的MDP中部分或条件期望的超然性。如果是有限的，则存在基于最优权重的确定性调度程序。与非负权重的设置相反，最佳调度程序可能需要无限内存，并且其值可能不合理。但是，最佳值可以近似于MDP大小和log（1 /ε）多项式的时间指数绝对误差ε。

著录项

来源
《International Conference on Foundations of Software Science and Computation Structures;European Conferences on Theory and Practice of Software》|2019年|436-452|共17页
会议地点 Prague(CZ)
作者
Jakob Piribauer; Christel Baier;
展开▼
作者单位

Technische Universitaet Dresden Dresden Germany;

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Conditional Value-at-Risk for Random Immediate Reward Variables in Markov Decision Processes [J] . Masayuki Kageyama, Takayuki Fujii, Koji Kanefuji, American Journal of Computational Mathematics . 2011,第3期

机译：Markov决策过程中随机立即奖励变量的条件风险值
2. Optimal threshold probability and expectation in semi-Markov decision processes [J] . Sakaguchi M., Ohtsubo Y. Applied mathematics and computation . 2010,第10期

机译：半马尔可夫决策过程中的最佳阈值概率和期望
3. An Expectation Maximization Algorithm for Continuous Markov Decision Processes with Arbitrary Reward [J] . Arnaud Doucet, Jan Peters, Matthew Hoffman, JMLR: Workshop and Conference Proceedings . 2009,第2009期

机译：具有任意奖励的连续马尔可夫决策过程的期望最大化算法
4. Partial and Conditional Expectations in Markov Decision Processes with Integer Weights [C] . Jakob Piribauer, Christel Baier International Conference on Foundations of Software Science and Computation Structures . 2019

机译：Markov决策过程中的部分和条件期望与整数重量
5. Overlapping codon model, phylogenetic clustering, and alternative partial expectation conditional maximization algorithm. [D] . Chen, Wei-Chen. 2011

机译：重叠密码子模型，系统发生聚类和替代的部分期望条件最大化算法。
6. Comparison of methods for calculating conditional expectations of sufficient statistics for continuous time Markov chains [O] . Paula Tataru, Asger Hobolth 2011

机译：连续时间马尔可夫链计算足够统计量的条件期望的方法比较
7. Partial and Conditional Expectations in Markov Decision Processes with Integer Weights [O] . Jakob Piribauer, Christel Baier 2019

机译：Markov决策过程中的部分和条件期望与整数重量

Partial and Conditional Expectations in Markov Decision Processes with Integer Weights

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅