...
【24h】

Expected Window Mean-Payoff

机译:预期窗口均值支付

获取原文

摘要

We study the expected value of the window mean-payoff measure in Markov decision processes (MDPs) and Markov chains (MCs). The window mean-payoff measure strengthens the classical mean-payoff measure by measuring the mean-payoff over a window of bounded length that slides along an infinite path. This measure ensures better stability properties than the classical mean-payoff. Window mean-payoff has been introduced previously for two-player zero-sum games. As in the case of games, we study several variants of this definition: the measure can be defined to be prefix-independent or not, and for a fixed window length or for a window length that is left parametric. For fixed window length, we provide polynomial time algorithms for the prefix-independent version for both MDPs and MCs. When the length is left parametric, the problem of computing the expected value on MDPs is as hard as computing the mean-payoff value in two-player zero-sum games, a problem for which it is not known if it can be solved in polynomial time. For the prefix-dependent version, surprisingly, the expected window mean-payoff value cannot be computed in polynomial time unless P=PSPACE. For the parametric case and the prefix-dependent case, we manage to obtain algorithms with better complexities for MCs.
机译:我们研究了马尔可夫决策过程(MDP)和马尔可夫链(MC)中窗口均值支付测度的期望值。窗口均值度量通过在沿着无限路径滑动的有限长度的窗口上测量均值来增强经典均值度量。与经典的均值支付相比,此措施可确保更好的稳定性。窗口均值收益先前已针对两人零和游戏引入。与游戏的情况一样,我们研究了此定义的几种变体:可以将度量定义为与前缀无关或不与前缀无关,并且对于固定的窗口长度或左参数的窗口长度。对于固定的窗口长度,我们为MDP和MC提供了与前缀无关的版本的多项式时间算法。当长度留为参数时,在MDP上计算期望值的问题与在两人零和博弈中计算平均收益值一样困难,这个问题尚不清楚是否可以在多项式中解决时间。对于前缀相关的版本,令人惊讶的是,除非P = PSPACE,否则无法在多项式时间内计算期望的窗口均值。对于参数情况和与前缀相关的情况,我们设法获得了具有更好复杂性的MC算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号