首页> 外文会议>IEEE International Conference on Real-time Computing and Robotics >Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards
【24h】

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards

机译:稀疏奖励缩放连续机器人任务的数据有效的深度增强学习方法

获取原文

摘要

Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.
机译:处理稀疏奖励的机器人连续控制问题是深度加强学习(RL)的长期挑战。虽然现有的DRL算法已经在视觉观测中展示了学习政策的巨大进展,但学习有效政策仍然需要一个不切实际的现实数据样本。此外,某些机器人任务自然地以稀疏奖励指定,这使得珍贵的数据效率低下并且减慢了学习过程,使DRL不可行。此外,手动整形奖励功能是一个复杂的工作,因为它需要特定的域知识和人为干预。为了缓解问题,本文提出了一个名为TD3Mher的无模型违规的RL方法,以了解具有稀疏奖励的持续机器人任务的操作策略。具体而言,TD3MHER利用双延迟深度确定性政策梯度算法(TD3)和模型驱动的后敏感体验重放(MIRHER)以实现高效的培训特性。因为在代理商正在学习政策时,TD3Mher也可以帮助它学习机器人的权益物理模型,这有助于解决任务,并且它不需要任何新颖的机器人环境相互作用。使用7-DOF操纵器对模拟机器人任务进行评估TD3Mher的性能,以将所提出的技术与先前的DRL算法进行比较,并验证我们方法的有用性。模拟机器人任务实验结果表明,该方法能够成功利用以前具有稀疏奖励的先前存储样本,并获得更快的学习速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号