首页> 外文会议>American Control Conference >Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions
【24h】

Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions

机译:具有顺序观察到的转移的马尔可夫决策过程的最优策略的凸综合

获取原文

摘要

This paper extends finite state and action space Markov Decision Process (MDP) models by introducing a new type of measurement for the outcomes of actions. The new measurement allows to sequentially observe the next-state transition for taking an action, i.e., the actions are ordered and the next action outcome in the sequence is observed only if the current action is not chosen. The sequentially-observed MDP (SO-MDP) shares some properties with a standard MDP: among history dependent policies, Markovian ones are still optimal. SO-MDP policies have the advantage of producing better rewards than standard optimal MDP policies due to additional measurements. Computing these policies, on the other hand, is more complex and we present a linear programming based synthesis of the optimal decision policies for the finite horizon SO-MDPs. A simulation example of multiple autonomous agents is also provided to demonstrate the SO-MDP model and the proposed policy synthesis method.
机译:本文通过介绍一种针对行动结果的新型度量方法,扩展了有限状态和行动空间的马尔可夫决策过程(MDP)模型。新的测量允许顺序观察采取动作的下一状态转换,即,只有在未选择当前动作的情况下,才对动作进行排序并且观察到序列中的下一动作结果。顺序观察的MDP(SO-MDP)与标准MDP具有某些属性:在历史依赖策略中,马尔可夫策略仍然是最佳的。由于额外的度量,SO-MDP策略具有比标准最佳MDP策略产生更好的报酬的优势。另一方面,计算这些策略比较复杂,我们提出了基于线性规划的有限水平SO-MDP最佳决策策略的综合方法。还提供了多个自治代理的仿真示例,以演示SO-MDP模型和所提出的策略综合方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号