IMPROVED Q_(MDP) POLICY FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES IN LARGE DOMAINS: EMBEDDING EXPLORATION DYNAMICS

GIORGOS APOSTOLikAS; SPYROS TZAFESTAS

首页> 外文期刊>Intelligent automation and soft computing >IMPROVED Q_(MDP) POLICY FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES IN LARGE DOMAINS: EMBEDDING EXPLORATION DYNAMICS

【24h】

IMPROVED Q_(MDP) POLICY FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES IN LARGE DOMAINS: EMBEDDING EXPLORATION DYNAMICS

机译：大域中部分可观察到的马尔可夫决策过程的改进Q_（MDP）策略：嵌入勘探动力学

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Artificial Intelligence techniques were primarily focused on domains in which at each time the state of the world is known to the system. Such domains can be modeled as a Markov Decision Process (MDP). Action and planning policies for MDPs have been studied extensively and several efficient methods exist. However, in real world problems pieces of information useful for the process of action selection are often missing. The theory of Partially Observable Markov Decision Processes (POMDP's) covers the problem domain in which the full state of the environment is not directly perceivable by the agent. Current algorithms for the exact solution of POMDP's are only applicable to domains with a small number of states. To cope with more extended state spaces, a number of methods that achieve sub-optimal solutions exist and among these the Q_(MDP) approach seems to be the best. We introduce a novel technique, called Explorative Q_(MDP) (EQ_(MDP)) which constitutes an important enhancement of the Q_(MDP) method. To the best knowledge of the authors, EQ_(MDP) is currently the most efficient method applicable to large POMDP domains.

机译：人工智能技术主要集中在系统每次都知道世界状态的领域。可以将此类域建模为马尔可夫决策过程（MDP）。对MDP的行动和计划策略进行了广泛的研究，并且存在几种有效的方法。但是，在现实世界中，经常会缺少对行动选择过程有用的信息。部分可观察的马尔可夫决策过程（POMDP）的理论涵盖了问题领域，在该领域中，代理无法直接感知环境的完整状态。用于POMDP精确解决方案的当前算法仅适用于状态数量很少的域。为了应付更多的扩展状态空间，存在许多实现次优解决方案的方法，其中Q_（MDP）方法似乎是最好的。我们介绍了一种称为探索性Q_（MDP）（EQ_（MDP））的新技术，该技术构成了Q_（MDP）方法的重要增强。据作者所知，EQ_（MDP）当前是适用于大型POMDP域的最有效方法。

著录项

来源
《Intelligent automation and soft computing》 |2004年第3期|p.209-220|共12页
作者
GIORGOS APOSTOLikAS; SPYROS TZAFESTAS;
展开▼
作者单位

Intelligent Robotics and Automation Laboratory Department of Signals, Control and Robotics School of Electrical and Computer Engineering National Technical University of Athens Zografou 15773, Athens, Greece;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
POMDP; Q_(MDP); action selection;

机译：POMDP;Q_（MDP）;动作选择;

相似文献

外文文献
中文文献
专利

1. Optimum inspection and maintenance policies for corroded structures using partially observable Markov decision processes and stochastic, physically based models [J] . K.G. Papakonstantinou, M. Shinozuka Probabilistic engineering mechanics . 2014,第jula期

机译：使用部分可观察的马尔可夫决策过程和基于物理的随机模型对腐蚀结构进行最佳检查和维护，
2. PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS [J] . JOHN GOULIONIS, D. STENGOS International Journal of Information Technology & Decision Making . 2011,第6期

机译：部分可观察的马尔可夫决策过程和周期性策略及其应用
3. PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES AND PERIODIC POLICIES WITH APPLICATIONS [J] . JOHN GOULIONIS∗† and D. STENGOS‡ International Journal of Information Technology & Decision Making . 2011,第6期

机译：可部分观察的马尔可夫决策过程和周期性策略及其应用
4. Evolving Policies for Multi-Reward Partially Observable Markov Decision Processes (MR-POMDPs) [C] . Harold Soh, Yiannis Demiris GECCO '11;Annual conference on genetic and evolutionary computation . 2012

机译：多奖励部分可观察的马尔可夫决策过程（MR-POMDP）的发展策略
5. Improving dynamic decision making through RFID: A partially observable Markov decision process (POMDP) for RFID-enhanced warehouse search operations. [D] . Hariharan, Sharethram. 2006

机译：通过RFID改善动态决策：针对RFID增强的仓库搜索操作的部分可观察到的马尔可夫决策过程（POMDP）。
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Partially Observable Markov Decision Processes (POMDPs) [O] . Guy Shani, Ronen I. Brafman, Solomon E. Shimony, 2012

机译：部分可观察马尔可夫决策过程（pOmDp）

IMPROVED Q_(MDP) POLICY FOR PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES IN LARGE DOMAINS: EMBEDDING EXPLORATION DYNAMICS

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅