首页> 外文会议>Machine learning >A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon
【24h】

A Learning Rate Analysis of Reinforcement Learning Algorithms in Finite-Horizon

机译:有限视野下强化学习算法的学习率分析

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Many reinforcement learning algorithms, like Q-Learning or R-Learning, correspond to adaptative methods for solving Markovian decision problems in infinite-horizon when no model is available. In this article we consider the particular framework of non-stationary finite-horizon Markov Decision Processes. After establishing a relationship between the finite-horizon total reward criterion and the average-reward criterion in finite-horizon, we define Q_H-Learning and R_H-Learning for finite-horizon MDPs. Then we introduce the Ordinary Differential Equation (ODE) method to conduct a learning rate analysis of Q_H-Learning and R_H-Learning. R_H-Learning appears to be a version of Q_H -Learning with matrix-valued step-sizes, the corresponding gain matrix being very close to the optimal matrix which results from the ODE analysis. Experimental results confirm that performance hierarchy.
机译:当没有模型可用时,许多强化学习算法(例如Q学习或R学习)都对应于无限水平中解决马尔可夫决策问题的自适应方法。在本文中,我们考虑非平稳有限水平马尔可夫决策过程的特定框架。在建立有限水平总奖励标准和有限水平平均奖励标准之间的关系后,我们为有限水平MDP定义了Q_H-Learning和R_H-Learning。然后我们介绍了常微分方程(ODE)方法来进行Q_H-Learning和R_H-Learning的学习率分析。 R_H-Learning似乎是Q_H-Learning的一种版本,具有矩阵值的步长,相应的增益矩阵非常接近ODE分析得出的最佳矩阵。实验结果证实了性能等级。

著录项

  • 来源
    《Machine learning》|1998年|215-223|共9页
  • 会议地点 Madison WI(US);Madison WI(US)
  • 作者单位

    INRA/BIA, Auzeville BP 27 31326 Castanet Tolosan cedex France;

    INRA/BIA, Auzeville BP 27 31326 Castanet Tolosan cedex France;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机的应用;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号