首页> 外文会议>International conference on concurrency theory >Robust Synchronization in Markov Decision Processes
【24h】

Robust Synchronization in Markov Decision Processes

机译:马尔可夫决策过程中的鲁棒同步

获取原文

摘要

We consider synchronizing properties of Markov decision processes (MDP), viewed as generators of sequences of probability distributions over states. A probability distribution is p-synchronizing if the probability mass is at least p in some state, and a sequence of probability distributions is weakly p-synchronizing, or strongly p-synchronizing if respectively infinitely many, or all but finitely many distributions in the sequence are p-synchronizing. For each synchronizing mode, an MDP can be (ⅰ) sure winning if there is a strategy that produces a 1-synchronizing sequence; (ⅱ) almost-sure winning if there is a strategy that produces a sequence that is, for all ε > 0, a (1-ε)-synchronizing sequence; (ⅲ) limit-sure winning if for all ε > 0, there is a strategy that produces a (1-ε)-synchronizing sequence. For each synchronizing and winning mode, we consider the problem of deciding whether an MDP is winning, and we establish matching upper and lower complexity bounds of the problems, as well as the optimal memory requirement for winning strategies: (a) for all winning modes, we show that the problems are PSPACE-complete for weak synchronization, and PTIME-complete for strong synchronization; (b) we show that for weak synchronization, exponential memory is sufficient and may be necessary for sure winning, and infinite memory is necessary for almost-sure winning; for strong synchronization, linear-size memory is sufficient and may be necessary in all modes; (c) we show a robustness result that the almost-sure and limit-sure winning modes coincide for both weak and strong synchronization.
机译:我们考虑将马尔可夫决策过程(MDP)的同步属性视为状态概率分布序列的生成器。如果概率质量在某些状态下至少为p,则概率分布为p同步,并且如果序列中的分布分别为无限多或几乎全部,则概率分布序列为弱p同步,或者为强p同步。是p同步的。对于每种同步模式,如果有产生1同步序列的策略,则MDP可以()获胜。 (ⅱ)如果有一种策略能够产生对于所有ε> 0而言都是(1-ε)同步序列的序列,则几乎可以肯定获胜; (ⅲ)如果所有ε> 0,都有一个确保(1-ε)同步序列的策略。对于每种同步和获胜模式,我们都会考虑确定MDP是否获胜的问题,并为问题建立匹配的上下复杂度边界,以及获胜策略的最佳内存要求:(a)对于所有获胜模式,我们证明问题在于,对于弱同步,问题是PSPACE完全;对于强同步,问题是PTIME-完全。 (b)我们表明,对于弱同步而言,指数存储就足够了,可能对于确定获胜是必要的,而无限存储对于几乎确定的获胜是必需的;为了实现强同步,线性大小的存储器就足够了,并且在所有模式下都可能是必需的; (c)我们显示出一种鲁棒性结果,即弱同步和强同步的几乎保证和极限保证获胜模式是重合的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号