【24h】

Value-Directed Belief State Approximation for POMDPs

机译:POMDP的价值导向的信念状态近似

获取原文
获取原文并翻译 | 示例

摘要

We consider the problem belief-state monitoring for the purposes of implementing a policy for a partially-observable Markov decision process (POMDP), specifically how one might approxi-mate the belief state. Other schemes for belief-state approximation (e.g., based on minimizing a measure such as KL-divergence between the true and estimated state) are not necessarily is deter-mined by the expected error in utility rather than by the error in the belief state itself. We propose heuristic methods for finding good projection schemes for belief state estimation-exhibiting anytime characteristics-given a POMDP value function. We also describe several algorithms for constructing bounds on the error in decision qual-ity (expected utility) associated with acting in ac-cordance with a given belief state approximation.
机译:我们考虑问题信念状态监视的目的在于为部分可观察的马尔可夫决策过程(POMDP)实施一项政策,特别是如何近似信念状态。信念状态近似的其他方案(例如,基于最小化真实状态和估计状态之间的KL差异等度量)不一定由效用的预期误差而不是信念状态本身的误差来确定。我们提出了一种启发式方法,用于为信念状态估计找到良好的投影方案,无论何时都具有特征,并赋予POMDP值函数。我们还描述了几种算法,用于构造决策质量(预期效用)中的误差界限,该误差与根据给定的信念状态近似进行的行为相关。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号