首页> 外文会议>International conference on concurrency theory >Robust Synchronization in Markov Decision Processes

【24h】

Robust Synchronization in Markov Decision Processes

机译：马尔可夫决策过程中的鲁棒同步

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We consider synchronizing properties of Markov decision processes (MDP), viewed as generators of sequences of probability distributions over states. A probability distribution is p-synchronizing if the probability mass is at least p in some state, and a sequence of probability distributions is weakly p-synchronizing, or strongly p-synchronizing if respectively infinitely many, or all but finitely many distributions in the sequence are p-synchronizing. For each synchronizing mode, an MDP can be (ⅰ) sure winning if there is a strategy that produces a 1-synchronizing sequence; (ⅱ) almost-sure winning if there is a strategy that produces a sequence that is, for all ε ＞ 0, a (1-ε)-synchronizing sequence; (ⅲ) limit-sure winning if for all ε ＞ 0, there is a strategy that produces a (1-ε)-synchronizing sequence. For each synchronizing and winning mode, we consider the problem of deciding whether an MDP is winning, and we establish matching upper and lower complexity bounds of the problems, as well as the optimal memory requirement for winning strategies: (a) for all winning modes, we show that the problems are PSPACE-complete for weak synchronization, and PTIME-complete for strong synchronization; (b) we show that for weak synchronization, exponential memory is sufficient and may be necessary for sure winning, and infinite memory is necessary for almost-sure winning; for strong synchronization, linear-size memory is sufficient and may be necessary in all modes; (c) we show a robustness result that the almost-sure and limit-sure winning modes coincide for both weak and strong synchronization.

机译：我们考虑将马尔可夫决策过程（MDP）的同步属性视为状态概率分布序列的生成器。如果概率质量在某些状态下至少为p，则概率分布为p同步，并且如果序列中的分布分别为无限多或几乎全部，则概率分布序列为弱p同步，或者为强p同步。是p同步的。对于每种同步模式，如果有产生1同步序列的策略，则MDP可以（）获胜。（ⅱ）如果有一种策略能够产生对于所有ε＞ 0而言都是（1-ε）同步序列的序列，则几乎可以肯定获胜; （ⅲ）如果所有ε＞ 0，都有一个确保（1-ε）同步序列的策略。对于每种同步和获胜模式，我们都会考虑确定MDP是否获胜的问题，并为问题建立匹配的上下复杂度边界，以及获胜策略的最佳内存要求：（a）对于所有获胜模式，我们证明问题在于，对于弱同步，问题是PSPACE完全;对于强同步，问题是PTIME-完全。（b）我们表明，对于弱同步而言，指数存储就足够了，可能对于确定获胜是必要的，而无限存储对于几乎确定的获胜是必需的;为了实现强同步，线性大小的存储器就足够了，并且在所有模式下都可能是必需的; （c）我们显示出一种鲁棒性结果，即弱同步和强同步的几乎保证和极限保证获胜模式是重合的。

著录项

来源
《International conference on concurrency theory 》|2014年|234-248|共15页
会议地点
作者
Laurent Doyen; Thierry Massart; Mahsa Shirmohammadi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. FDD-LTE系统中的鲁棒帧同步方法 [J] . 徐文虎, 杨广琦, 刘进, 东南大学学报（英文版） . 2011 ,第003期
2. The complexity of synchronizing Markov decision processes [J] . Doyen Laurent, Massart Thierry, Shirmohammadi Mahsa Journal of computer and system sciences . 2019 ,第MARa期

机译：同步马尔可夫决策过程的复杂性
3. Robust topological policy iteration for infinite horizon bounded Markov Decision Processes [J] . Silva Reis Willy Arthur, de Barros Leliane Nunes, Delgado Karina Valdivia 高分子論文集 . 2019 ,第FEBa期

机译：无限地平线有界Markov决策过程的鲁棒拓扑策略迭代
4. Light robustness in the optimization of Markov decision processes with uncertain parameters [J] . Buchholz Peter, Scheftelowitsch Dimitri Computers & operations research . 2019 ,第AUGa期

机译：参数不确定的Markov决策过程优化中的光鲁棒性
5. Robust Synchronization in Markov Decision Processes [C] . Laurent Doyen, Thierry Massart, Mahsa Shirmohammadi International Conference on Concurrency Theory . 2014

机译：马尔可夫决策过程中的鲁棒同步
6. Concurrent Markov Decision Processes for Robust Robot Team Learning under Uncertainty. [D] . Girard, Justin. 2014

机译：不确定条件下鲁棒机器人团队学习的并行马尔可夫决策过程。
7. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
8. Robust Synchronization in Markov Decision Processes⋆ [O] . Laurent Doyen, Thierry Massart, Mahsa Shirmohammadi 2015

机译：马尔可夫决策过程中的鲁棒同步⋆

Robust Synchronization in Markov Decision Processes

摘要

著录项

相似文献

相关主题

期刊订阅