首页> 外文期刊>International Journal of Advanced Robotic Systems >Kalman Based Finite State Controller for Partially Observable Domains
【24h】

Kalman Based Finite State Controller for Partially Observable Domains

机译:基于Kalman的部分可观察域的有限状态控制器。

获取原文
获取原文并翻译 | 示例
       

摘要

A real world environment is often partially observable by the agents either because of noisy sensors or incomplete perception. Moreover, it has continuous state space in nature, and agents must decide on an action for each point in internal continuous belief space. Consequently, it is convenient to model this type of decision-making problems as Partially Observable Markov Decision Processes (POMDPs) with continuous observation and state space. Most of the POMDP methods whether approximate or exact assume that the underlying world dynamics or POMDP parameters such as transition and observation probabilities are known. However, for many real world environments it is very difficult if not impossible to obtain such information. We assume that only the internal dynamics of the agent, such as the actuator noise, interpretation of the sensor suite, are known. Using these internal dynamics, our algorithm, namely Kalman Based Finite State Controller (KBFSC), constructs an internal world model over the continuous belief space, represented by a finite state automaton. Constructed automaton nodes are points of the continuous belief space sharing a common best action and a common uncertainty level. KBFSC deals with continuous Gaussian-based POMDPs. It makes use of Kalman Filter for belief state estimation, which also is an efficient method to prune unvisited segments of the belief space and can foresee the reachable belief points approximately calculating the horizon N policy. KBFSC does not use an "explore and update" approach in the value calculation as TD-learning. Therefore KBFSC does not have an extensive exploration-exploitation phase. Using the MDP case reward and the internal dynamics of the agent, KBFSC can automatically construct the finite state automaton (FSA) representing the approximate optimal policy without the need for discretization of the state and observation space. Moreover, the policy always converges for POMDP problems.
机译:由于噪声传感器或不完整的感知,代理商通常可以部分观察到真实环境。而且,它本质上具有连续的状态空间,代理必须为内部连续的信念空间中的每个点决定一个动作。因此,将这种类型的决策问题建模为具有连续观察和状态空间的部分可观察的马尔可夫决策过程(POMDP)十分方​​便。大多数POMDP方法(无论是近似方法还是精确方法)都假定已知基本的世界动力学或POMDP参数(例如过渡和观测概率)。但是,对于许多现实环境,即使不是不可能,也很难获得这样的信息。我们假设仅了解代理的内部动态,例如执行器噪声,传感器套件的解释。利用这些内部动力学,我们的算法,即基于卡尔曼的有限状态控制器(KBFSC),在由有限状态自动机表示的连续置信空间上构造了一个内部世界模型。构造的自动机节点是连续信念空间的点,它们共享共同的最佳动作和共同的不确定性级别。 KBFSC处理基于高斯的连续POMDP。它利用卡尔曼滤波器进行置信状态估计,这也是一种修剪置信空间中未访问区域的有效方法,并且可以预见可计算出的置信点,近似地计算了水平N策略。 KBFSC在值计算中不使用“探索和更新”方法作为TD学习。因此,KBFSC没有广泛的勘探开发阶段。借助MDP案例奖励和代理的内部动力,KBFSC可以自动构造表示近似最优策略的有限状态自动机(FSA),而无需离散化状态和观察空间。而且,该策略总是收敛于POMDP问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号