首页> 外文OA文献 >Temporal Markov Decision Problems : Formalization and Resolution
【2h】

Temporal Markov Decision Problems : Formalization and Resolution

机译:时间马尔可夫决策问题:形式化和解决

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。
获取外文期刊封面目录资料

摘要

This thesis addresses the question of planning under uncertainty within a time-dependent changing environment. Original motivation for this work came from the problem of building an autonomous agent able to coordinate with itsuduncertain environment; this environment being composed of other agents communicating their intentions or non-controllable processes for which some discrete-event model is available. We investigate several approaches for modeling continuous time-dependency in the framework of Markov Decision Processes (MDPs), leading us to a definition of Temporal Markov Decision Problems. Then our approach focuses on two separate paradigms. First, we investigate time-dependent problems as emph{implicit-event} processes and describe them through the formalism of Time-dependent MDPs (TMDPs). We extend the existing results concerning optimality equations and present a new Value Iteration algorithm based on piecewise polynomial function representations in order to solve a more general class of TMDPs. This paves the way to a more general discussion on parametric actions in hybrid state and action spaces MDPs with continuous time. In a second time, we investigate theudoption of separately modeling the concurrent contributions of exogenous events. This approach of emph{explicit-event} modeling leads to the use of Generalized Semi-Markov Decision Processes (GSMDP). We establish a link between the general framework of Discrete Events Systems Specification (DEVS) and the formalism of GSMDP, allowing us to build sound discrete-event compatible simulators. Then we introduce a simulation-based Policy Iteration approach forudexplicit-event Temporal Markov Decision Problems. This algorithmic contribution brings together results from simulation theory, forward search in MDPs, and statistical learning theory. The implicit-event approach was tested on audspecific version of the Mars rover planning problem and on a drone patrol mission planning problem while the explicit-event approach was evaluated on a subway network control problem.
机译:本文解决了在随时间变化的环境中不确定性下的计划问题。开展这项工作的最初动机来自于建立一个能够与其未知环境协调的自治代理的问题。此环境由其他代理传达其意图或不可控制的过程组成,对于这些事件,某些离散事件模型可用。我们研究了在马尔可夫决策过程(MDP)框架内建模连续时间依赖的几种方法,使我们得出了时间马尔可夫决策问题的定义。然后,我们的方法集中在两个单独的范例上。首先,我们将时间依赖问题作为 emph {implicit-event}流程进行调查,并通过时间依赖MDP(TMDP)的形式描述它们。我们扩展了有关最优性方程的现有结果,并提出了一种基于分段多项式函数表示的新值迭代算法,以解决更通用的TMDP类。这为在具有连续时间的混合状态和动作空间MDP中的参数动作展开更一般的讨论铺平了道路。第二次,我们研究了对外部事件的并发贡献进行单独建模的方法。 emph {explicit-event}建模的这种方法导致使用广义半马尔可夫决策过程(GSMDP)。我们在离散事件系统规范(DEVS)的通用框架与GSMDP的形式主义之间建立了联系,从而使我们能够构建可靠的离散事件兼容模拟器。然后,我们介绍了一种基于仿真的策略迭代方法,用于显式事件的时间马尔可夫决策问题。该算法的贡献汇集了来自仿真理论,MDP中的正向搜索和统计学习理论的结果。隐式事件方法在火星漫游者计划问题的特定版本和无人机巡逻任务计划问题上进行了测试,而显式事件方法在地铁网络控制问题上进行了评估。

著录项

  • 作者

    Rachelson Emmanuel;

  • 作者单位
  • 年度 2009
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号