首页> 外文会议>Uncertainty in Artificial Intelligence >Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards
【24h】

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

机译:具有非马尔可夫奖赏的决策过程的随时状态基于解决方案的方法

获取原文

摘要

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our "favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic (FLTL) to express rewards. Our translation has the effect of embedding model-checking in the solution method. It results in an MDP of the minimal size achievable without stepping outside the anytime framework, and consequently in better policies by the deadline.
机译:解决带有非马尔可夫奖励(NMRDP)的决策过程的一种流行方法是利用奖励函数的紧凑表示形式,将NMRDP自动转换为适合我们的“最喜欢的MDP解决方案方法”的等效马尔可夫决策过程(MDP)。本文的贡献是非马尔可夫奖赏函数的表示形式,以及对MDP的转换,旨在最大程度地利用基于状态的随时性算法作为求解方法,这些算法通过明确构造和探索状态空间的某些部分而得以实现。能够以计算时间换取策略质量,并已证明在处理大型MDP方面相当有效;我们的表示法扩展了未来的线性时序逻辑(FLTL)以表示奖励;我们的转换具有将模型检查嵌入解决方案方法的效果。这样就可以实现最小规模的MDP,而不会超出任何时间框架,因此,通过MDP可以制定更好的策略。最后期限。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号