Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards

机译：稀疏奖励缩放连续机器人任务的数据有效的深度增强学习方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Dealing with the robotic continuous control problem with sparse rewards is a longstanding challenge in deep reinforcement learning (RL). While existing DRL algorithms have demonstrated great progress in learning policies from visual observations, learning effective policies still requires an impractical number of real-world data samples. Moreover, some robotic tasks are naturally specified with sparse rewards, which makes the precious data inefficient and slows down the learning process, making DRL infeasible. In addition, manually shaping reward functions is a complex work because it needs specific domain knowledge and human intervention. To alleviate the issue, this paper proposes a model-free, off-policy RL approach named TD3MHER, to learn the manipulating policy for continuous robotic tasks with sparse rewards. To be specific, TD3MHER utilizes Twin Delayed Deep Deterministic policy gradient algorithm (TD3) and Model-driven Hindsight Experience Replay (MHER) to achieve highly sample-efficient training property. Because while the agent is learning the policy, TD3MHER could also help it to learn the potation physical model of the robot which is helpful to solve the task, and it does not necessitate any novel robot-environment interactions. The performance of TD3MHER is assessed on a simulated robotic task using a 7-DOF manipulator to compare the proposed technique to a previous DRL algorithm and to verify the usefulness of our method. Results of the experiments on simulated robotic task show that the proposed approach is capable of successfully utilizing previously store samples with sparse rewards, and obtain a faster learning speed.

机译：处理稀疏奖励的机器人连续控制问题是深度加强学习（RL）的长期挑战。虽然现有的DRL算法已经在视觉观测中展示了学习政策的巨大进展，但学习有效政策仍然需要一个不切实际的现实数据样本。此外，某些机器人任务自然地以稀疏奖励指定，这使得珍贵的数据效率低下并且减慢了学习过程，使DRL不可行。此外，手动整形奖励功能是一个复杂的工作，因为它需要特定的域知识和人为干预。为了缓解问题，本文提出了一个名为TD3Mher的无模型违规的RL方法，以了解具有稀疏奖励的持续机器人任务的操作策略。具体而言，TD3MHER利用双延迟深度确定性政策梯度算法（TD3）和模型驱动的后敏感体验重放（MIRHER）以实现高效的培训特性。因为在代理商正在学习政策时，TD3Mher也可以帮助它学习机器人的权益物理模型，这有助于解决任务，并且它不需要任何新颖的机器人环境相互作用。使用7-DOF操纵器对模拟机器人任务进行评估TD3Mher的性能，以将所提出的技术与先前的DRL算法进行比较，并验证我们方法的有用性。模拟机器人任务实验结果表明，该方法能够成功利用以前具有稀疏奖励的先前存储样本，并获得更快的学习速度。

著录项

来源
《IEEE International Conference on Real-time Computing and Robotics》|2021年|1425-1431|共7页
会议地点
作者
Junkai Ren; Yichuan Zhang; Yujun Zeng; Yixing Lan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Visualization; Conferences; Training data; Reinforcement learning; Manipulators; Real-time systems;

机译：培训;可视化;会议;培训数据;加强学习;操纵器;实时系统;

相似文献

外文文献
中文文献
专利

1. 一种基于深度稀疏自编码的语音情感迁移学习方法 [J] . 梁镇麟, 梁瑞宇, 唐曼婷, 东南大学学报（英文版） . 2019,第002期
2. Efficient hindsight reinforcement learning using demonstrations for robotic tasks with sparse rewards [J] . Guoyu Zuo, Qishen Zhao, Jiahao Lu, International Journal of Advanced Robotic Systems . 2020,第1期

机译：使用具有稀疏奖励的机器人任务的演示高效的后敏感钢筋学习
3. Advising reinforcement learning toward scaling agents in continuous control environments with sparse rewards [J] . Hailin Ren, Pinhas Ben-Tzvi Engineering Applications of Artificial Intelligence . 2020,第Apra期

机译：建议在稀疏奖励的连续控制环境中针对扩展代理的强化学习
4. Task-Oriented Deep Reinforcement Learning for Robotic Skill Acquisition and Control [J] . Guofei Xiang, Jianbo Su Cybernetics, IEEE Transactions on . 2021,第2期

机译：针对机器人技能获取和控制的任务导向深度加固学习
5. Curriculum Learning Based on Reward Sparseness for Deep Reinforcement Learning of Task Completion Dialogue Management [C] . Atsushi Saito 2018 EMNLP workshop SCAI: 2nd international workshop on search-oriented conversational AI . 2018

机译：基于奖励稀疏性的课程学习，用于任务完成对话管理的深度强化学习
6. Deep Reinforcement Learning with Accelerated Reward Function Technique for Robotics Task Planning [D] . Shaikh, Shifa. 2021

机译：机器人任务规划加速奖励功能技术的深增强学习
7. A New Information-Theoretic Method for Advertisement Conversion Rate Prediction for Large-Scale Sparse Data Based on Deep Learning [O] . Qianchen Xia, Jianghua Lv, Shilong Ma, 2020

机译：基于深度学习的大规模稀疏数据的广告转换率预测新信息 - 理论理学方法
8. Curriculum Learning Based on Reward Sparseness for Deep Reinforcement Learning of Task Completion Dialogue Management [O] . Atsushi Saito 2018

机译：基于奖励稀疏的课程学习，以对对话管理的深度加固学习

Data-efficient Deep Reinforcement Learning Method Toward Scaling Continuous Robotic Task with Sparse Rewards

摘要

著录项

相似文献

相关主题

期刊订阅