首页> 外文期刊>RAIRO Operation Research >INFLUENCE OF MODELING STRUCTURE IN PROBABILISTIC SEQUENTIAL DECISION PROBLEMS
【24h】

INFLUENCE OF MODELING STRUCTURE IN PROBABILISTIC SEQUENTIAL DECISION PROBLEMS

机译:建模结构对概率顺序决策问题的影响

获取原文
获取原文并翻译 | 示例
       

摘要

Markov Decision Processes (MDPs) are a classical framework for stochastic sequential decision problems, based on an enumerated state space representation. More compact and structured representations have been proposed: factorization techniques use state variables representations, while decomposition techniques are based on a partition of the state space into sub-regions and take advantage of the resulting structure of the state transition graph. We use a family of probabilistic exploration-like planning problems in order to study the influence of the modeling structure on the MDP solution. We first discuss the advantages and drawbacks of a graph based representation of the state space, then present our comparisons of two decomposition techniques, and propose to use a global approach combining both state space factorization and decomposition techniques. On the exploration problem instance, it is proposed to take advantage of the natural topo-logical structure of the navigation space, which is partitioned into regions. A number of local policies are optimized within each region, that become the macro-actions of the global abstract MDP resulting from the decomposition. The regions are the corresponding macro-states in the abstract MDP. The global abstract MDP is obtained in a factored form, combining all the initial MDP state variables and one macro-state "region" variable standing for the different possible macro-states corresponding to the regions. Further research is presently conducted on efficient solution algorithms implementing the same hybrid approach for tackling large size MDPs.
机译:马尔可夫决策过程(MDP)是基于枚举状态空间表示的用于随机顺序决策问题的经典框架。已经提出了更紧凑和结构化的表示形式:分解技术使用状态变量表示形式,而分解技术则基于状态空间在子区域中的划分,并利用状态转换图的结果结构。为了研究建模结构对MDP解决方案的影响,我们使用了一系列类似于概率探索的计划问题。我们首先讨论基于图形的状态空间表示的优缺点,然后介绍两种分解技术的比较,并提出使用结合状态空间分解和分解技术的全局方法。在探索问题的实例上,提出利用导航空间的自然拓扑结构,该结构被划分为多个区域。每个区域内优化了许多本地策略,这些策略成为了分解所导致的全局抽象MDP的宏动作。区域是抽象MDP中的相应宏状态。以分解形式获得全局抽象MDP,将所有初始MDP状态变量和一个宏状态“区域”变量组合在一起,代表对应于区域的不同可能的宏状态。目前,对有效解决方案算法进行了进一步的研究,这些算法采用相同的混合方法来处理大型MDP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号