首页> 外文会议>Machine learning >Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning
【24h】

Sensitive Discount Optimality: Unifying Discounted and Average Reward Reinforcement Learning

机译:敏感性折扣最优:统一折扣和平均奖励强化学习

获取原文
获取原文并翻译 | 示例

摘要

Thus far, research in reinforcement learning (RL) has concentrated on two optimality criteria: the discounted framework, which has been very well-studied, and the average-reward framework, in which interest is rapidly increasing. This paper presents a framework called sensitive discount optimality which offers an elegant way of linking these two paradigms. Although sensitive discount optimality has been well studied in dynamic programming, with several provably convergent algorithms, it has not received any attention in RL. This framework is based on studying the properties of the expected cumulative discounted reward, as discounting tends to 1. Under these conditions, the cumulative discounted reward can be expanded using a Laurent series expansion to yields a sequence of terms, the first of which is the average reward, the second involves the average adjusted sum of rewards (or bias), etc. We use the sensitive discount optimality framework to derive a new model-free average reward technique, which is related to Q-learning type methods proposed by Bert-sekas, Schwartz, and Singh, but which unlike these previous methods, optimizes both the first and second terms in the Laurent series (average reward and bias values).
机译:迄今为止,强化学习(RL)的研究集中在两个最优标准上:经过深入研究的折价框架和兴趣迅速增长的平均奖励框架。本文提出了一个称为敏感折扣最优的框架,该框架提供了将这两种范式联系起来的一种优雅方式。尽管在动态规划中已经对灵敏的折扣最优性进行了充分的研究,并且使用了几种可证明的收敛算法,但它在RL中并未引起任何关注。该框架基于研究预期的累计折现奖励的特性,因为折现趋向于1。在这种情况下,可以使用Laurent级数展开来扩展累积折现奖励,以产生一系列的项,第一个是平均奖励,第二个涉及平均调整后的奖励总和(或偏差)等。我们使用敏感的折扣最优框架来推导一种新的无模型平均奖励技术,该技术与Bert-提出的Q学习类型方法有关sekas,Schwartz和Singh,但是与这些以前的方法不同,它们优化了Laurent系列中的第一项和第二项(平均奖励和偏差值)。

著录项

  • 来源
    《Machine learning》|1996年|328-336|共9页
  • 会议地点 Bari(IT);Bari(IT)
  • 作者

    Sridhar Mahadevan;

  • 作者单位

    Department of Computer Science and Engineering University of South Florida Tampa, Florida 33620;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机的应用;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号