首页> 外文期刊>Logical Methods in Computer Science >Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes
【24h】

Unifying Two Views on Multiple Mean-Payoff Objectives in Markov Decision Processes

机译:统一马尔可夫决策过程中多个均值支付目标的两种观点

获取原文
       

摘要

We consider Markov decision processes (MDPs) with multiple limit-average (ormean-payoff) objectives. There exist two different views: (i) the expectationsemantics, where the goal is to optimize the expected mean-payoff objective,and (ii) the satisfaction semantics, where the goal is to maximize theprobability of runs such that the mean-payoff value stays above a given vector.We consider optimization with respect to both objectives at once, thus unifyingthe existing semantics. Precisely, the goal is to optimize the expectationwhile ensuring the satisfaction constraint. Our problem captures the notion ofoptimization with respect to strategies that are risk-averse (i.e., ensurecertain probabilistic guarantee). Our main results are as follows: First, wepresent algorithms for the decision problems which are always polynomial in thesize of the MDP. We also show that an approximation of the Pareto-curve can becomputed in time polynomial in the size of the MDP, and the approximationfactor, but exponential in the number of dimensions. Second, we present acomplete characterization of the strategy complexity (in terms of memory boundsand randomization) required to solve our problem.
机译:我们考虑具有多个极限平均(平均收益)目标的马尔可夫决策过程(MDP)。存在两种不同的观点:(i)期望语义,其目标是优化期望的均值收益目标;(ii)满意度语义,其目的是最大化运行概率,以使均值收益值保持不变高于给定向量。我们考虑同时针对两个目标进行优化,从而统一现有语义。精确地,目标是在确保满意度约束的同时优化期望。我们的问题涵盖了针对风险规避策略(即确保确定的概率保证)的优化概念。我们的主要结果如下:首先,我们提出了决策问题的算法,这些决策问题总是MDP大小的多项式。我们还表明,可以在时间多项式中以MDP的大小和近似因子来计算帕累托曲线的近似值,但是维数的指数则为指数。其次,我们提出解决问题所需的策略复杂性的完整特征(根据内存界限和随机性)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号