首页> 外文会议>European Conference on Machine Learning(ECML 2005); 20051003-07; Porto(PT) >Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes
【24h】

Using Rewards for Belief State Updates in Partially Observable Markov Decision Processes

机译:在部分可观察的马尔可夫决策过程中使用奖励进行信念状态更新

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Partially Observable Markov Decision Processes (POMDP) provide a standard framework for sequential decision making in stochastic environments. In this setting, an agent takes actions and receives observations and rewards from the environment. Many POMDP solution methods are based on computing a belief state, which is a probability distribution over possible states in which the agent could be. The action choice of the agent is then based on the belief state. The belief state is computed based on a model of the environment, and the history of actions and observations seen by the agent. However, reward information is not taken into account in updating the belief state. In this paper, we argue that rewards can carry useful information that can help disambiguate the hidden state. We present a method for updating the belief state which takes rewards into account. We present experiments with exact and approximate planning methods on several standard POMDP domains, using this belief update method, and show that it can provide advantages, both in terms of speed and in terms of the quality of the solution obtained.
机译:部分可观察的马尔可夫决策过程(POMDP)提供了用于在随机环境中进行顺序决策的标准框架。在这种情况下,代理采取行动并从环境中获得观察和回报。许多POMDP解决方案方法都基于计算置信状态,该置信状态是主体可能处于的可能状态的概率分布。代理的动作选择则基于信念状态。信念状态是基于环境模型以及代理看到的动作和观察的历史来计算的。但是,在更新信任状态时不会考虑奖励信息。在本文中,我们认为奖励可以携带有用的信息,这些信息可以帮助消除隐藏状态的歧义。我们提出一种更新奖励状态的信念状态的方法。我们使用这种置信度更新方法,在几种标准POMDP域上使用精确和近似的规划方法进行了实验,并显示了该方法在速度和所获得解决方案的质量方面都可以提供优势。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号