Reward Estimation for Variance Reduction in Deep Reinforcement Learning

Joshua Romoff; Peter Henderson; Alexandre Piche; Vincent Francois-Lavet; Joelle Pineau

首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Reward Estimation for Variance Reduction in Deep Reinforcement Learning

【24h】

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

机译：深度强化学习中减少方差的奖励估计

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement Learning (RL) agents require the specification of a reward signal for learning behaviours. However, introduction of corrupt or stochastic rewards can yield high variance in learning. Such corruption may be a direct result of goal misspecification, randomness in the reward signal, or correlation of the reward with external factors that are not known to the agent. Corruption or stochasticity of the reward signal can be especially problematic in robotics, where goal specification can be particularly difficult for complex tasks. While many variance reduction techniques have been studied to improve the robustness of the RL process, handling such stochastic or corrupted reward structures remains difficult. As an alternative for handling this scenario in model-free RL methods, we suggest using an estimator for both rewards and value functions. We demonstrate that this improves performance under corrupted stochastic rewards in both the tabular and non-linear function approximation settings for a variety of noise types and environments. The use of reward estimation is a robust and easy-to-implement improvement for handling corrupted reward signals in model-free RL.

机译：强化学习（RL）代理要求为学习行为指定奖励信号。但是，引入腐败或随机奖励会在学习中产生很大差异。这种破坏可能是目标错位，奖励信号随机性或奖励与代理人未知的外部因素之间的相关性的直接结果。奖励信号的损坏或随机性在机器人技术中尤其成问题，对于复杂的任务，目标说明可能特别困难。尽管已经研究了许多减少方差的技术来提高RL过程的鲁棒性，但是处理这种随机或损坏的奖励结构仍然很困难。作为在无模型RL方法中处理这种情况的替代方法，我们建议对奖励和价值函数都使用估计器。我们证明，在各种噪声类型和环境的表格和非线性函数逼近设置中，这在破坏随机奖励的情况下提高了性能。奖励估算的使用是一种健壮且易于实现的改进，用于处理无模型RL中损坏的奖励信号。

著录项

来源
《JMLR: Workshop and Conference Proceedings》 |2018年第4期|共26页
作者
Joshua Romoff; Peter Henderson; Alexandre Piche; Vincent Francois-Lavet; Joelle Pineau;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning [J] . Oron Anschel, Nir Baram, Nahum Shimkin JMLR: Workshop and Conference Proceedings . 2017,第3期

机译：平均DQN：深度强化学习的方差减少和稳定
2. Power Optimization in Device-to-Device Communications: A Deep Reinforcement Learning Approach With Dynamic Reward [J] . Ji Zelin, Kiani Adnan K., Qin Zhijin, Wireless Communications Letters, IEEE . 2021,第3期

机译：设备到设备通信中的功率优化：具有动态奖励的深度增强学习方法
3. A novel asynchronous deep reinforcement learning model with adaptive early forecasting method and reward incentive mechanism for short-term load forecasting [J] . Zhang Wenyu, Chen Qian, Yan Jianyong, Energy . 2021,第Pta1期

机译：一种新的异步深度加强学习模型，具有自适应早期预测方法及短期负荷预测奖励激励机制
4. Averaged-DQN: Variance Reduction and Stabilization for Deep Reinforcement Learning [C] . Oron Anschel, Nir Baram, Nahum Shimkin International Conference on Machine Learning . 2018

机译：平均dqn：深度加强学习的差异减少和稳定
5. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
6. Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning [O] . Chia-Tzu Li, Wen-Sung Lai, Chih-Min Liu, 2014

机译：推断精神分裂症患者的奖励预测错误：强化学习的动态奖励任务
7. Curriculum Learning Based on Reward Sparseness for Deep Reinforcement Learning of Task Completion Dialogue Management [O] . Atsushi Saito 2018

机译：基于奖励稀疏的课程学习，以对对话管理的深度加固学习

Reward Estimation for Variance Reduction in Deep Reinforcement Learning

摘要

著录项

相似文献

相关主题

期刊订阅