...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Reward Estimation for Variance Reduction in Deep Reinforcement Learning
【24h】

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

机译:深度强化学习中减少方差的奖励估计

获取原文
           

摘要

Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. However, introduction of corrupt or stochastic rewards can yield high variance in learning. Such corruption may be a direct result of goal misspecification, randomness in the reward signal, or correlation of the reward with external factors that are not known to the agent. Corruption or stochasticity of the reward signal can be especially problematic in robotics, where goal specification can be particularly difficult for complex tasks. While many variance reduction techniques have been studied to improve the robustness of the RL process, handling such stochastic or corrupted reward structures remains difficult. As an alternative for handling this scenario in model-free RL methods, we suggest using an estimator for both rewards and value functions. We demonstrate that this improves performance under corrupted stochastic rewards in both the tabular and non-linear function approximation settings for a variety of noise types and environments. The use of reward estimation is a robust and easy-to-implement improvement for handling corrupted reward signals in model-free RL.
机译:强化学习(RL)代理要求为学习行为指定奖励信号。但是,引入腐败或随机奖励会在学习中产生很大差异。这种破坏可能是目标错位,奖励信号随机性或奖励与代理人未知的外部因素之间的相关性的直接结果。奖励信号的损坏或随机性在机器人技术中尤其成问题,对于复杂的任务,目标说明可能特别困难。尽管已经研究了许多减少方差的技术来提高RL过程的鲁棒性,但是处理这种随机或损坏的奖励结构仍然很困难。作为在无模型RL方法中处理这种情况的替代方法,我们建议对奖励和价值函数都使用估计器。我们证明,在各种噪声类型和环境的表格和非线性函数逼近设置中,这在破坏随机奖励的情况下提高了性能。奖励估算的使用是一种健壮且易于实现的改进,用于处理无模型RL中损坏的奖励信号。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号