...
首页> 外文期刊>Discrete event dynamic systems: Theory and applications >A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
【24h】

A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

机译:用于定点近似和有效时差学习的广义卡尔曼滤波器

获取原文
获取原文并翻译 | 示例

摘要

The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desired fixed point, the algorithm updates parameters based on noisy samples of functions generated by application of the operator, in the spirit of Robbins–Monro stochastic approximation. The algorithm is motivated by temporal-difference learning, and our developments lead to a possibly more efficient variant of temporal-difference learning. We establish convergence of the algorithm and explore efficiency gains through computational experiments involving optimal stopping and queueing problems.
机译:传统的卡尔曼滤波器可以看作是递归随机算法,通过给定有噪声样本序列的预定基函数的线性组合,可以近似未知函数。在本文中,我们将算法推广到一种近似算子的固定点的算法,该算子被称为欧几里得范数收缩。该算法以Robbins–Monro随机逼近的精神为基础,而不是基于所需固定点的噪声样本,而是根据操作员应用程序生成的函数的噪声样本来更新参数。该算法是由时差学习驱动的,并且我们的发展导致了时差学习的一种可能更有效的变体。我们建立算法的收敛性,并通过涉及最佳停止和排队问题的计算实验来探索效率提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号