Learning from Reinforcement and Advice Using Composite Reward Functions

机译：使用复合奖励功能从强化和建议中学习

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Reinforcement learning has become a widely used methodology for creating intelligent agents in a wide range of applications. However, its performance deteriorates in tasks with sparse feedback or lengthy inter-reinforcement times. This paper presents an extension that makes use of an advisory entity to provide additional feedback to the agent. The agent incorporates both the rewards provided by the environment and the advice to attain faster learning speed, and policies that are tuned towards the preferences of the advisor while still achieving the underlying task objective. The advice is converted to "tuning" or user rewards that, together with the task rewards, define a composite reward function that more accurately defines the advisor's perception of the task. At the same time, the formation of erroneous loops due to incorrect user rewards is avoided using formal bounds on the user reward component. This approach is illustrated using a robot navigation task.

机译：强化学习已成为在各种应用程序中创建智能代理的一种广泛使用的方法。但是，其性能在反馈少或补间时间长的任务中会变差。本文提出了一种扩展，该扩展利用咨询实体向代理提供其他反馈。代理结合了环境提供的奖励和获得更快学习速度的建议，以及既能实现基本任务目标又能适应顾问偏爱的政策。该建议被转换为“调整”或用户奖励，它们与任务奖励一起定义了复合奖励功能，该功能可以更准确地定义顾问对任务的感知。同时，通过使用用户奖励组件上的形式边界，避免了由于错误的用户奖励而导致的错误循环的形成。使用机器人导航任务来说明此方法。

著录项

来源
《International Florida Artiticial Intelligence Research Society Conference and International Flairs Conference: Recent Advances in Artificial Intelligece; 2003》|2003年|P.361-365|共5页
会议地点
作者
Vinay N. Papudesi; Manfred Huber;
展开▼
作者单位

Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX, 76019-0015;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类自动化系统理论;工程模拟;
关键词

相似文献

外文文献
中文文献
专利

1. Prediction of Reward Functions for Deep Reinforcement Learning via Gaussian Process Regression [J] . Lim Jaehyun, Ha Seungchul, Choi Jongeun Mechatronics, IEEE/ASME Transactions on . 2020,第4期

机译：高斯过程回归深增强学习的奖励功能预测
2. Reinforcement Learning With Composite Rewards for Production Scheduling in a Smart Factory [J] . Tong Zhou, Dunbing Tang, Haihua Zhu, Quality Control, Transactions . 2021,第1期

机译：综合奖励加固学习智能工厂生产调度
3. Deep reinforcement learning-based safe interaction for industrial human-robot collaboration using intrinsic reward function [J] . Quan Liu, Zhihao Liu, Bo Xiong, Advanced engineering informatics . 2021,第Auga期

机译：使用内在奖励功能的工业人员机器人合作的深度增强基于学习的安全互动
4. Learning from reinforcement and advice using composite reward functions [C] . Vinay N. Papudesi, Manfred Huber International Florida Artificial Intelligence Research Society Conference . 2003

机译：使用综合奖励功能来学习钢筋和建议
5. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
6. Reinforcement Q-Learning Control With Reward Shaping Function for Swing Phase Control in a Semi-active Prosthetic Knee [O] . Yonatan Hutabarat, Kittipong Ekkachai, Mitsuhiro Hayashibe, 2020

机译：增强Q学习控制在半主动假肢膝关节中为摆动相位控制的奖励塑造功能
7. Reinforcement Learning With Composite Rewards for Production Scheduling in a Smart Factory [O] . Tong Zhou, Dunbing Tang, Haihua Zhu, 2021

机译：综合奖励加固学习智能工厂生产调度

Learning from Reinforcement and Advice Using Composite Reward Functions

摘要

著录项

相似文献

相关主题

期刊订阅