首页> 外文期刊>Robotics and Autonomous Systems >Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation
【24h】

Deep reinforcement learning with smooth policy update: Application to robotic cloth manipulation

机译:深度加强学习,具有顺利的政策更新:在机器人布操控中的应用

获取原文
获取原文并翻译 | 示例
       

摘要

Deep Reinforcement Learning (DRL), which can learn complex policies with high-dimensional observations as inputs, e.g., images, has been successfully applied to various tasks. Therefore, it may be suitable to apply them for robots to learn and perform daily activities like washing and folding clothes, cooking, and cleaning since such tasks are difficult for non-DRL methods that often require either (1) direct access to state variables or (2) well-designed hand-engineered features extracted from sensory inputs. However, applying DRL to real robots remains very challenging because conventional DRL algorithms require a huge number of training samples for learning, which is arduous in real robots. To alleviate this dilemma, in this paper, we propose two sample efficient DRL algorithms: Deep P-Network (DPN) and Dueling Deep P-Network (DDPN). The core idea is to combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples. The proposed methods were first investigated by a robot-arm reaching task in the simulation that compared previous DRL methods and applied to two real robotic cloth manipulation tasks: (1) flipping a handkerchief and (2) folding a t-shirt with a limited number of samples. All the results suggest that our method outperformed the previous DRL methods. (C) 2018 The Authors. Published by Elsevier B.V.
机译:深度加强学习(DRL),可以将具有高维观察的复杂政策作为输入,例如图像,已成功应用于各种任务。因此,它可能适合将它们应用于机器人学习和执行日常活动,如洗涤和折叠衣服,烹饪和清洁,因为这种任务难以用于通常需要(1)直接访问状态变量或者的非DRL方法(2)精心设计的手工工程特征从感觉输入中提取。然而,将DRL应用于真正的机器人仍然非常具有挑战性,因为传统的DRL算法需要大量的学习样本,这在真正的机器人中艰巨。在本文中,为了缓解这种困境,我们提出了两个样本高效的DRL算法:Deep P-Network(DPN)和Dueling Deep P-Network(DDPN)。核心思想是将平稳政策更新的性质与深神经网络中的自动特征提取的能力相结合,以提高样品效率和样品较少的学习稳定性。所提出的方法首先通过机器人臂达到了在模拟中的任务来研究,这些方法比较了以前的DRL方法,并应用于两个真正的机器人布操控任务:(1)翻转手帕和(2)折叠有有限数量的T恤样品。所有结果表明,我们的方法优于前一个DRL方法。 (c)2018作者。 elsevier b.v出版。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号