首页> 外文期刊>International Journal of Advanced Robotic Systems >Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards
【24h】

Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards

机译:使用具有稀疏奖励的机器人任务的演示高效的后敏感钢筋学习

获取原文
       

摘要

The goal of reinforcement learning is to enable an agent to learn by using rewards. However, some robotic tasks naturally specify with sparse rewards, and manually shaping reward functions is a difficult project. In this article, we propose a general and model-free approach for reinforcement learning to learn robotic tasks with sparse rewards. First, a variant of Hindsight Experience Replay, Curious and Aggressive Hindsight Experience Replay, is proposed to improve the sample efficiency of reinforcement learning methods and avoid the need for complicated reward engineering. Second, based on Twin Delayed Deep Deterministic policy gradient algorithm, demonstrations are leveraged to overcome the exploration problem and speed up the policy training process. Finally, the action loss is added into the loss function in order to minimize the vibration of output action while maximizing the value of the action. The experiments on simulated robotic tasks are performed with different hyperparameters to verify the effectiveness of our method. Results show that our method can effectively solve the sparse reward problem and obtain a high learning speed.
机译:强化学习的目标是使代理商通过使用奖励来学习。但是,一些机器人任务自然地用稀疏奖励指定,手动整形奖励功能是一个困难的项目。在本文中,我们提出了一种用于加强学习的一般和模型方法,以学习具有稀疏奖励的机器人任务。首先,提出了一种后古经验重播,好奇和侵略性的后古经验重播,提高了加强学习方法的样本效率,避免了复杂奖励工程的需求。其次,基于双延迟深度确定性政策梯度算法,杠杆演示克服勘探问题并加快策略培训过程。最后,将动作丢失添加到损耗函数中,以便最小化输出动作的振动,同时最大化动作的值。模拟机器人任务的实验是用不同的超参数进行,以验证我们方法的有效性。结果表明,我们的方法可以有效解决稀疏奖励问题并获得高学习速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号