首页> 外文会议>Uncertainty in Artificial Intelligence >Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

【24h】

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

机译：具有非马尔可夫奖赏的决策过程的随时状态基于解决方案的方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A popular approach to solving a decision process with non-Markovian rewards (NMRDP) is to exploit a compact representation of the reward function to automatically translate the NMRDP into an equivalent Markov decision process (MDP) amenable to our "favorite MDP solution method. The contribution of this paper is a representation of non-Markovian reward functions and a translation into MDP aimed at making the best possible use of state-based anytime algorithms as the solution method. By explicitly constructing and exploring only parts of the state space, these algorithms are able to trade computation time for policy quality, and have proven quite effective in dealing with large MDPs. Our representation extends future linear temporal logic (FLTL) to express rewards. Our translation has the effect of embedding model-checking in the solution method. It results in an MDP of the minimal size achievable without stepping outside the anytime framework, and consequently in better policies by the deadline.

机译：解决带有非马尔可夫奖励（NMRDP）的决策过程的一种流行方法是利用奖励函数的紧凑表示形式，将NMRDP自动转换为适合我们的“最喜欢的MDP解决方案方法”的等效马尔可夫决策过程（MDP）。本文的贡献是非马尔可夫奖赏函数的表示形式，以及对MDP的转换，旨在最大程度地利用基于状态的随时性算法作为求解方法，这些算法通过明确构造和探索状态空间的某些部分而得以实现。能够以计算时间换取策略质量，并已证明在处理大型MDP方面相当有效;我们的表示法扩展了未来的线性时序逻辑（FLTL）以表示奖励;我们的转换具有将模型检查嵌入解决方案方法的效果。这样就可以实现最小规模的MDP，而不会超出任何时间框架，因此，通过MDP可以制定更好的策略。最后期限。

著录项

来源
《Uncertainty in Artificial Intelligence 》|2002年|p.501-510|共10页
会议地点
作者
Sylvie Thiebaux; Froduald Kabanza; John Slaney;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Decision-Theoretic Planning with non-Markovian Rewards [J] . Gretton C., Kabanza F., Price D., The Journal of Artificial Intelligence Research . 2006 ,第12期

机译：具有非马尔可夫奖赏的决策理论规划
2. Decision-Theoretic Planning with non-Markovian Rewards [J] . S. Thiebaux, C. Gretton, J. Slaney, Journal of Automation, Mobile Robotics & Intelligent Systems . 2006 ,第5期

机译：具有非马尔可夫奖赏的决策理论规划
3. Decision-Theoretic Planning with non-Markovian Rewards [J] . Sylvie Thiebaux, Charles Gretton, John Slaney, The Journal of Artificial Intelligence Research . 2006 ,第0期

机译：具有非马尔可夫奖赏的决策理论规划
4. Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards [C] . Sylvie Thiebaux, Froduald Kabanza, John Slaney Conference on uncertainth in artificial intelligence . 2002

机译：随时基于国家的解决方案方法，用于非马尔维亚奖励的决策过程
5. Regret-based reward elicitation for Markov decision processes. [D] . Kevin, Regan. 2014

机译：基于后悔的马尔可夫决策过程的奖励启发。
6. Reward and decision processes in the brains of humans and nonhuman primates [O] . Angela Sirigu, Jean-René Duhamel 2016

机译：人类和非人类灵长类动物大脑中的奖励和决策过程
7. Structured solution methods for non-Markovian decision processes [O] . Fahiem Bacchus, Craig Boutilier, Adam Grove 1997

机译：非马尔可夫决策过程的结构化求解方法

Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards

摘要

著录项

相似文献

相关主题

期刊订阅