【24h】

Learning from Reinforcement and Advice Using Composite Reward Functions

机译:使用复合奖励功能从强化和建议中学习

获取原文
获取原文并翻译 | 示例

摘要

Reinforcement learning has become a widely used methodology for creating intelligent agents in a wide range of applications. However, its performance deteriorates in tasks with sparse feedback or lengthy inter-reinforcement times. This paper presents an extension that makes use of an advisory entity to provide additional feedback to the agent. The agent incorporates both the rewards provided by the environment and the advice to attain faster learning speed, and policies that are tuned towards the preferences of the advisor while still achieving the underlying task objective. The advice is converted to "tuning" or user rewards that, together with the task rewards, define a composite reward function that more accurately defines the advisor's perception of the task. At the same time, the formation of erroneous loops due to incorrect user rewards is avoided using formal bounds on the user reward component. This approach is illustrated using a robot navigation task.
机译:强化学习已成为在各种应用程序中创建智能代理的一种广泛使用的方法。但是,其性能在反馈少或补间时间长的任务中会变差。本文提出了一种扩展,该扩展利用咨询实体向代理提供其他反馈。代理结合了环境提供的奖励和获得更快学习速度的建议,以及既能实现基本任务目标又能适应顾问偏爱的政策。该建议被转换为“调整”或用户奖励,它们与任务奖励一起定义了复合奖励功能,该功能可以更准确地定义顾问对任务的感知。同时,通过使用用户奖励组件上的形式边界,避免了由于错误的用户奖励而导致的错误循环的形成。使用机器人导航任务来说明此方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号