Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

Guoyu Zuo; Qishen Zhao; Jiahao Lu; Jiangeng Li

首页> 外文期刊>International Journal of Advanced Robotic Systems >Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

【24h】

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

机译：使用具有稀疏奖励的机器人任务的演示高效的后敏感钢筋学习

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.

机译：强化学习的目标是使代理商通过使用奖励来学习。但是，一些机器人任务自然地用稀疏奖励指定，手动整形奖励功能是一个困难的项目。在本文中，我们提出了一种用于加强学习的一般和模型方法，以学习具有稀疏奖励的机器人任务。首先，提出了一种后古经验重播，好奇和侵略性的后古经验重播，提高了加强学习方法的样本效率，避免了复杂奖励工程的需求。其次，基于双延迟深度确定性政策梯度算法，杠杆演示克服勘探问题并加快策略培训过程。最后，将动作丢失添加到损耗函数中，以便最小化输出动作的振动，同时最大化动作的值。模拟机器人任务的实验是用不同的超参数进行，以验证我们方法的有效性。结果表明，我们的方法可以有效解决稀疏奖励问题并获得高学习速度。

著录项

来源
《International Journal of Advanced Robotic Systems》 |2020年第1期|共页
作者
Guoyu Zuo; Qishen Zhao; Jiahao Lu; Jiangeng Li;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Robot learningreinforcement learningsparse rewardCAHERdemonstrations;

机译：机器人学习救助学习得到了奖励奖励纪念;
入库时间 2022-08-19 00:00:56

相似文献

外文文献
中文文献
专利

1. Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients [J] . Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Neural computation . 2021,第6期

机译：与后威政策梯度的稀疏奖励环境中的加固学习
2. SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards [J] . Krishnan Sanjay, Garg Animesh, Liaw Richard, The International journal of robotics research . 2019,第2a3期

机译：SWIRL：顺序窗口逆强化学习算法，用于延迟奖励的机器人任务
3. SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY [J] . Yan Tao, Zhang Wenan, Yang Simon X., International Journal of Robotics & Automation . 2019,第5期

机译：软电演位批评机器人机器人与后勤体验重播的批评
4. Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards [C] . Junkai Ren, Yichuan Zhang, Yujun Zeng, IEEE International Conference on Real-time Computing and Robotics . 2021

机译：稀疏奖励缩放连续机器人任务的数据有效的深度增强学习方法
5. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
6. A Task-Learning Strategy for Robotic Assembly Tasks from Human Demonstrations [O] . Guanwen Ding, Yubin Liu, Xizhe Zang, 2020

机译：人类示威活动的机器人装配任务任务学习策略
7. Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients [O] . Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, 2021

机译：与后威政策梯度的稀疏奖励环境中的加固学习

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

摘要

著录项

相似文献

相关主题

期刊订阅