Distributional Reward Decomposition for Reinforcement Learning

机译：增强学习的分布奖励分解

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Many reinforcement learning (RL) tasks have specific properties that can be leveraged to modify existing RL algorithms to adapt to those tasks and further improve performance, and a general class of such properties is the multiple reward channel. In those environments the full reward can be decomposed into sub-rewards obtained from different channels. Existing work on reward decomposition either requires prior knowledge of the environment to decompose the full reward, or decomposes reward without prior knowledge but with degraded performance. In this paper, we propose Distributional Reward Decomposition for Reinforcement Learning (DRDRL), a novel reward decomposition algorithm which captures the multiple reward channel structure under distributional setting. Empirically, our method captures the multi-channel structure and discovers meaningful reward decomposition, without any requirements on prior knowledge. Consequently, our agent achieves better performance than existing methods on environments with multiple reward channels.

机译：许多增强学习（RL）任务具有可以利用的特定属性来修改现有的RL算法以适应那些任务并进一步提高性能，并且一般类别的此类属性是多个奖励通道。在这些环境中，全部奖励可以分解为从不同渠道获得的子奖励。奖励分解的现有工作需要先前了解环境，以分解完整奖励，或者在未经事先知识的情况下分解奖励，但表现出降低。在本文中，我们提出了加强学习（DRDRL）的分配奖励分解，这是一种新颖的奖励分解算法，其在分布设置下捕获多奖励信道结构。凭经验，我们的方法捕获了多通道结构并发现有意义的奖励分解，没有任何关于先前知识的要求。因此，我们的代理比具有多个奖励渠道的环境的现有方法实现了更好的性能。

著录项

来源
《Conference on Neural Information Processing Systems》|2020年|p5571-6362|共10页
会议地点
作者
Zichuan Lin; Tao Qin; Li Zhao; Guangwen Yang; Derek Yang; Tie-Yan Liu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计量学;
关键词

相似文献

外文文献
中文文献
专利

1. Using Reward Machines for High-Level Task Specification and Decomposition in Reinforcement Learning [J] . Rodrigo Toro Icarte, Toryn Klassen, Richard Valenzano, JMLR: Workshop and Conference Proceedings . 2018,第1期

机译：使用奖励机进行强化学习中的高级任务指定和分解
2. Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance [J] . W. Bradley Knox, Peter Stone Artificial intelligence . 2015,第auga期

机译：从人的奖励中构筑强化学习：奖励积极性，暂时性打折，流行和表现
3. Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning [J] . Chia-Tzu Li, Wen-Sung Lai, Chih-Min Liu, Frontiers in Psychology . 2014,第4期

机译：推断精神分裂症患者的奖励预测错误：强化学习的动态奖励任务
4. Distributional Reward Decomposition for Reinforcement Learning [C] . Zichuan Lin, Tao Qin, Li Zhao, Conference on Neural Information Processing Systems . 2020

机译：增强学习的分布奖励分解
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Inferring reward prediction errors in patients with schizophrenia: a dynamic reward task for reinforcement learning [O] . Chia-Tzu Li, Wen-Sung Lai, Chih-Min Liu, 2014

机译：推断精神分裂症患者的奖励预测错误：强化学习的动态奖励任务
7. Deep Reinforcement Learning with Distributional Semantic Rewards for Abstractive Summarization [O] . Siyao Li, Deren Lei, Pengda Qin, 2019

机译：具有分布语义奖励的深度加强学习，用于抽象摘要
8. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance. [R] . Knox, W. B., Stone, P. 2014

机译：从人类奖励中学习强化学习：奖励积极性，时间贴现，情节性和表现。

Distributional Reward Decomposition for Reinforcement Learning

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅