首页> 外文会议>European Workshop on Reinforcement Learning >Regularized Fitted Q-Iteration: Application to Planning
【24h】

Regularized Fitted Q-Iteration: Application to Planning

机译:正规化的Q迭代:规划申请

获取原文

摘要

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing-kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure.
机译:我们考虑在Markovian决策问题中规划,即,找到良好政策的问题,了解了对环境的生成模型的良好政策。我们建议使用罚款(或正常化)最小二乘回归作为回归子程序来使用罚款,以解决控制模型复杂性的问题。当函数空间是用户选择的内核函数的再现 - 内核Hilbert空间时,详细介绍了算法。我们获得了解决方案质量的界限,并争辩说数据相关的惩罚可能会导致几乎最佳的性能。使用一个简单的例子来说明使用惩罚程序的好处。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号