Reward Mapping for Transfer in Long-Lived Agents

机译：在长寿代理商转移的奖励映射

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We consider how to transfer knowledge from previous tasks (MDPs) to a current task in long-lived and bounded agents that must solve a sequence of tasks over a finite lifetime. A novel aspect of our transfer approach is that we reuse reward functions. While this may seem counterintuitive, we build on the insight of recent work on the optimal rewards problem that guiding an agent's behavior with reward functions other than the task-specifying reward function can help overcome computational bounds of the agent. Specifically, we use good guidance reward functions learned on previous tasks in the sequence to incrementally train a reward mapping function that maps task-specifying reward functions into good initial guidance reward functions for subsequent tasks. We demonstrate that our approach can substantially improve the agent's performance relative to other approaches, including an approach that transfers policies.

机译：我们考虑如何将来自以前任务（MDP）的知识转移到当前任务中的长期和界限代理中，必须在有限寿命上解决一系列任务。我们转移方法的一个新方面是我们重用奖励函数。虽然这可能似乎是违反直观的，但我们建立了最近关于最佳奖励问题的洞察力，以指导代理人的行为与任务指定奖励函数以外的奖励功能，可以帮助克服代理的计算范围。具体而言，我们使用良好的指导奖励功能在序列中以前的任务中学到逐步训练奖励映射函数，将任务指定奖励功能映射到后续任务的良好初始指导奖励函数。我们证明，我们的方法可以大大提高代理商的性能相对于其他方法，包括转移政策的方法。

著录项

来源
《Annual conference on Neural Information Processing Systems》|2013年||共9页
会议地点
作者
Xiaoxiao Guo; Satinder Singh; Richard Lewis;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类信息处理（信息加工）;
关键词

相似文献

外文文献
中文文献
专利

1. Exploiting fugitive resources: How long-lived is "fugitive"? Fallen trees are a long-lasting reward for Ips typographus (Coleoptera, Curculionidae, Scolytinae) [J] . Louis Marceau, Gregoire Jean-Claude, Pelisson Pierre-Francois Forest Ecology and Management . 2014,第Null期

机译：利用逃犯资源：“逃犯”有多长时间？倒下的树木是Ips typographus（鞘翅目，Curculionidae，Scolytinae）的长期奖励。
2. REWARD CONTINGENCIES AND THE RECALIBRATION OF TASK MONITORING AND REWARD SYSTEMS: A HIGH-DENSITY ELECTRICAL MAPPING STUDY [J] . K. P. MORIE, P. DE SANCTIS, J. J. FOXE Neuroscience: An International Journal under the Editorial Direction of IBRO . 2014,第Null期

机译：奖励情况和任务监控与奖励系统的重新校准：高密度电气映射研究
3. Attention ignores rewards when feature-reward mappings are uncertain [J] . Alejandro Lleras, Brian Levinthal Journal of vision . 2010,第7期

机译：当功能奖励映射不确定时，注意忽略奖励
4. Reward Mapping for Transfer in Long-Lived Agents [C] . Xiaoxiao Guo, Satinder Singh, Richard Lewis Annual conference on Neural Information Processing Systems . 2013

机译：长期代理中的转移奖励计划
5. Mapping Reward Values to Cues, Locations, and Objects: The Influence of Reward Associations on Visual Attention [D] . de Dios, Constanza. 2019

机译：映射奖励价值，提示，地点和对象：奖励协会对视觉关注的影响
6. Long-lived and transferable tumor immunity in mice after targeted interleukin-2 therapy. [O] . J C Becker, N Varki, S D Gillies, 1996

机译：靶向白介素2治疗后小鼠中的长寿和可转移肿瘤免疫力。
7. Reward Value-Based Goal Selection for Agents’ Cooperative Route Learning Without Communication in Reward and Goal Dynamism [O] . Fumito Uwano, Keiki Takadama 2020

机译：奖励基于价值的目标选择代理商的合作路线学习，而无需奖励和目标活力

Reward Mapping for Transfer in Long-Lived Agents

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅