Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

机译：在部分可观察的马尔可夫决策过程中使用奖励进行信念状态更新

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Partially Observable Markov Decision Processes (POMDP) provide a standard framework for sequential decision making in stochastic environments. In this setting, an agent takes actions and receives observations and rewards from the environment. Many POMDP solution methods are based on computing a belief state, which is a probability distribution over possible states in which the agent could be. The action choice of the agent is then based on the belief state. The belief state is computed based on a model of the environment, and the history of actions and observations seen by the agent. However, reward information is not taken into account in updating the belief state. In this paper, we argue that rewards can carry useful information that can help disambiguate the hidden state. We present a method for updating the belief state which takes rewards into account. We present experiments with exact and approximate planning methods on several standard POMDP domains, using this belief update method, and show that it can provide advantages, both in terms of speed and in terms of the quality of the solution obtained.

机译：部分可观察的马尔可夫决策过程（POMDP）提供了用于在随机环境中进行顺序决策的标准框架。在这种情况下，代理采取行动并从环境中获得观察和回报。许多POMDP解决方案方法都基于计算置信状态，该置信状态是主体可能处于的可能状态的概率分布。代理的动作选择则基于信念状态。信念状态是基于环境模型以及代理看到的动作和观察的历史来计算的。但是，在更新信任状态时不会考虑奖励信息。在本文中，我们认为奖励可以携带有用的信息，这些信息可以帮助消除隐藏状态的歧义。我们提出一种更新奖励状态的信念状态的方法。我们使用这种置信度更新方法，在几种标准POMDP域上使用精确和近似的规划方法进行了实验，并显示了该方法在速度和所获得解决方案的质量方面都可以提供优势。

著录项

来源
《European Conference on Machine Learning(ECML 2005); 20051003-07; Porto(PT)》|2005年|P.593-600|共8页
会议地点 Porto(PT)
作者
Masoumeh T. Izadi; Doina Precup;
展开▼
作者单位

McGill University, School of Computer Science, 3480 University St., Montreal, QC, Canada, H3A2A7;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models [J] . Xi-Ren Cao, Xianping Guo IEEE Transactions on Automatic Control . 2007,第期

机译：具有奖励信息的部分可观察的马尔可夫决策过程：基本思想和模型
2. Partially Observable Markov Decision Processes With Reward Information: Basic Ideas and Models [J] . Xi-Ren Cao, Xianping Guo IEEE Transactions on Automatic Control . 2007,第4期

机译：具有奖励信息的部分可观察的马尔可夫决策过程：基本思想和模型
3. Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions [J] . Shayegan Omidshafiei, Ali-Akbar Agha-Mohammadi, Christopher Amato, The International journal of robotics research . 2017,第2期

机译：使用置信空间宏作用的多机器人部分可观察的马尔可夫决策过程的分散控制
4. Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes [C] . Masoumeh T. Izadi, Doina Precup European Conference on Machine Learning . 2005

机译：在部分可观察到的马尔可夫决策过程中使用奖励进行信仰状态更新
5. Modern Methods of Hidden Markov Models and Partially Observable Markov Decision Processes in Biostatistics [D] . Xu, Zekun. 2020

机译：隐藏马尔可夫模型的现代方法和止痛性的部分可观察马尔可夫决策过程
6. Decision Making Under Uncertainty: A Neural Model Based on Partially Observable Markov Decision Processes [O] . Rajesh P. N. Rao 2010

机译：不确定性下的决策：基于部分可观察的马尔可夫决策过程的神经模型
7. Decentralized Control of Partially Observable Markov Decision Processes using Belief Space Macro-actions [O] . Shayegan Omidshafiei, Ali-akbar Agha-mohammadi, Christopher Amato, 2016

机译：利用置信空间宏观行为对部分可观测马尔可夫决策过程的分散控制

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅