首页> 外文期刊>Intelligent automation and soft computing >IMPROVED Q_(MDP) POLICY FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES IN LARGE DOMAINS: EMBEDDING EXPLORATION DYNAMICS
【24h】

IMPROVED Q_(MDP) POLICY FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES IN LARGE DOMAINS: EMBEDDING EXPLORATION DYNAMICS

机译:大域中部分可观察到的马尔可夫决策过程的改进Q_(MDP)策略:嵌入勘探动力学

获取原文
获取原文并翻译 | 示例

摘要

Artificial Intelligence techniques were primarily focused on domains in which at each time the state of the world is known to the system. Such domains can be modeled as a Markov Decision Process (MDP). Action and planning policies for MDPs have been studied extensively and several efficient methods exist. However, in real world problems pieces of information useful for the process of action selection are often missing. The theory of Partially Observable Markov Decision Processes (POMDP's) covers the problem domain in which the full state of the environment is not directly perceivable by the agent. Current algorithms for the exact solution of POMDP's are only applicable to domains with a small number of states. To cope with more extended state spaces, a number of methods that achieve sub-optimal solutions exist and among these the Q_(MDP) approach seems to be the best. We introduce a novel technique, called Explorative Q_(MDP) (EQ_(MDP)) which constitutes an important enhancement of the Q_(MDP) method. To the best knowledge of the authors, EQ_(MDP) is currently the most efficient method applicable to large POMDP domains.
机译:人工智能技术主要集中在系统每次都知道世界状态的领域。可以将此类域建模为马尔可夫决策过程(MDP)。对MDP的行动和计划策略进行了广泛的研究,并且存在几种有效的方法。但是,在现实世界中,经常会缺少对行动选择过程有用的信息。部分可观察的马尔可夫决策过程(POMDP)的理论涵盖了问题领域,在该领域中,代理无法直接感知环境的完整状态。用于POMDP精确解决方案的当前算法仅适用于状态数量很少的域。为了应付更多的扩展状态空间,存在许多实现次优解决方案的方法,其中Q_(MDP)方法似乎是最好的。我们介绍了一种称为探索性Q_(MDP)(EQ_(MDP))的新技术,该技术构成了Q_(MDP)方法的重要增强。据作者所知,EQ_(MDP)当前是适用于大型POMDP域的最有效方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号