首页> 外文会议>American Control Conference >Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions

【24h】

Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions

机译：具有顺序观察到的转移的马尔可夫决策过程的最优策略的凸综合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper extends finite state and action space Markov Decision Process (MDP) models by introducing a new type of measurement for the outcomes of actions. The new measurement allows to sequentially observe the next-state transition for taking an action, i.e., the actions are ordered and the next action outcome in the sequence is observed only if the current action is not chosen. The sequentially-observed MDP (SO-MDP) shares some properties with a standard MDP: among history dependent policies, Markovian ones are still optimal. SO-MDP policies have the advantage of producing better rewards than standard optimal MDP policies due to additional measurements. Computing these policies, on the other hand, is more complex and we present a linear programming based synthesis of the optimal decision policies for the finite horizon SO-MDPs. A simulation example of multiple autonomous agents is also provided to demonstrate the SO-MDP model and the proposed policy synthesis method.

机译：本文通过介绍一种针对行动结果的新型度量方法，扩展了有限状态和行动空间的马尔可夫决策过程（MDP）模型。新的测量允许顺序观察采取动作的下一状态转换，即，只有在未选择当前动作的情况下，才对动作进行排序并且观察到序列中的下一动作结果。顺序观察的MDP（SO-MDP）与标准MDP具有某些属性：在历史依赖策略中，马尔可夫策略仍然是最佳的。由于额外的度量，SO-MDP策略具有比标准最佳MDP策略产生更好的报酬的优势。另一方面，计算这些策略比较复杂，我们提出了基于线性规划的有限水平SO-MDP最佳决策策略的综合方法。还提供了多个自治代理的仿真示例，以演示SO-MDP模型和所提出的策略综合方法。

著录项

来源
《American Control Conference》|2016年|3862-3867|共6页
会议地点
作者
Mahmoud El Chamie; Behçet Açıkmeşe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Standards; Markov processes; Biological system modeling; Dynamic programming; Mathematical model; Current measurement; History;

机译：标准;马尔可夫过程;生物系统建模;动态规划;数学模型;电流测量;历史;

相似文献

外文文献
中文文献
专利

1. Nonuniqueness versus uniqueness of optimal policies in convex discounted markov decision processes [J] . Montes-De-Oca R., Lemus-Rodríguez E., Salem-Silva F.S. Journal of applied mathematics . 2013,第Pta1期

机译：凸折扣马尔可夫决策过程中最优策略的非唯一性与唯一性
2. Nonuniqueness versus Uniqueness of Optimal Policies in Convex Discounted Markov Decision Processes [J] . RaúlMontes-de-Oca, EnriqueLemus-Rodríguez, Francisco SergioSalem-Silva Journal of applied mathematics . 2013,第2期

机译：凸折扣马尔可夫决策过程中最优策略的非唯一性与唯一性
3. MONOTONE OPTIMAL POLICIES IN DISCOUNTED MARKOV DECISION PROCESSES WITH TRANSITION PROBABILITIES INDEPENDENT OF THE CURRENT STATE: EXISTENCE AND APPROXIMATION [J] . Rosa M. Flores-Hernandez Kybernetika . 2013,第5期

机译：具有独立于电流状态的转移概率的马尔可夫折扣决策过程中的单调最优策略：存在性和近似性
4. Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions [C] . Mahmoud El Chamie, Behcet Acikmese American Control Conference . 2016

机译：依次观察过渡的马尔可夫决策过程凸出的合成
5. A hybrid genetic/optimization algorithm for piecewise affine and convex Markov decision processes. [D] . Lin, Zong-Zhi. 1999

机译：分段仿射和凸马尔可夫决策过程的混合遗传/优化算法。
6. Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [O] . Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu 2018

机译：通过内在动机的自我博弈在多目标马尔可夫决策过程中发展稳健的政策覆盖范围
7. Regret-optimal policies in absorbing semi-Markov decision processes with multiple constraints(The Development of Information and Decision Processes) [O] . Kadota Yoshinobu, Kurano Masami, Yasuda Masami 2006

机译：吸收具有多个约束的半马尔可夫决策过程的后悔最优策略（信息和决策过程的发展）
8. Two Short Notes on Markov Processes: I. A Test for Sub-Optimal Actions in Markovian Decision Problems. II. An Intrinsically Determined Markov Chain [R] . MacQueen, J. B. 1966

机译：关于马尔可夫过程的两个简短说明：I。马尔可夫决策问题中次优最优行动的检验。 II。本质上确定的马尔可夫链

Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions

摘要

著录项

相似文献

相关主题

期刊订阅