首页> 外文会议>Conference on Neural Information Processing Systems >A Kernel Loss for Solving the Bellman Equation
【24h】

A Kernel Loss for Solving the Bellman Equation

机译:解决Bellman方程的内核损失

获取原文

摘要

Value function learning plays a central role in many state-of-the-art reinforcement-learning algorithms. Many popular algorithms like Q-learning do not optimize any objective function, but are fixed-point iterations of some variants of Bellman operator that are not necessarily a contraction. As a result, they may easily lose convergence guarantees, as can be observed in practice. In this paper, we propose a novel loss function, which can be optimized using standard gradient-based methods with guaranteed convergence. The key advantage is that its gradient can be easily approximated using sampled transitions, avoiding the need for double samples required by prior algorithms like residual gradient. Our approach may be combined with general function classes such as neural networks, using either on- or off-policy data, and is shown to work reliably and effectively in several benchmarks, including classic problems where standard algorithms are known to diverge.
机译:价值函数学习在许多最先进的加强学习算法中起着核心作用。 许多流行的算法如Q-Learning,不优化任何客观函数,而是是贝尔曼运算符的一些变体的定点迭代,这不一定是收缩。 因此,它们可能很容易失去收敛保证,可以在实践中观察到。 在本文中,我们提出了一种新的损失功能,可以使用具有保证收敛的标准梯度的方法优化。 关键优点是,使用采样的转变可以容易地近似,避免了现有算法等算法所需的双个样本的梯度。 我们的方法可以使用可根据或脱离策略数据与神经网络等通用函数类(如神经网络)组合,并且被示出在几个基准中可靠且有效地工作,包括已知分歧的标准算法的经典问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号