...
首页> 外文期刊>IEEE Robotics and Automation Letters >Batch Exploration With Examples for Scalable Robotic Reinforcement Learning
【24h】

Batch Exploration With Examples for Scalable Robotic Reinforcement Learning

机译:批量勘探与可扩展机器人强力学习的例子

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Learning from diverse offline datasets is a promising path towards learning general purpose robotic agents. However, a core challenge in this paradigm lies in collecting large amounts of meaningful data, while not depending on a human in the loop for data collection. One way to address this challenge is through task-agnostic exploration, where an agent attempts to explore without a task-specific reward function, and collect data that can be useful for any subsequent task. While these approaches have shown some promise in simple domains, they often struggle to explore the relevant regions of the state space in more challenging settings, such as vision-based robotic manipulation. This challenge stems from an objective that encourages exploring everything in a potentially vast state space. To mitigate this challenge, we propose to focus exploration on the important parts of the state space using weak human supervision. Concretely, we propose an exploration technique, Batch Exploration with Examples (BEE), that explores relevant regions of the state-space, guided by a modest number of human-provided images of important states. These human-provided images only need to be provided once at the beginning of data collection and can be acquired in a matter o fminutes, allowing us to scalably collect diverse datasets, which can then be combined with any batch RL algorithm. We find that BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka Emika Panda robot, and observe that compared to task-agnostic and weakly-supervised exploration techniques, it (1) interacts more than twice as often with relevant objects, and (2) improves subsequent task performance when used in conjunction with offline RL.
机译:从不同的离线数据集学习是学习通用机器人代理商的有希望的道路。然而,该范例中的核心挑战在于收集大量有意义的数据,而不是根据循环中的人类进行数据收集。解决这一挑战的一种方法是通过任务不可知探索,代理商试图在没有特定于任务特定的奖励函数的情况下探索,并收集对任何后续任务有用的数据。虽然这些方法在简单域中显示了一些承诺,但它们往往在更具挑战性环境中努力探索国家空间的相关区域,例如基于视觉的机器人操纵。这一挑战源于鼓励探索潜在巨大的州空间中的一切的目标。为了缓解这一挑战,我们建议使用弱人类监督对国家空间的重要部分探讨。具体地,我们提出了一种探索技术,与示例(蜜蜂)的批量勘探,探讨了状态空间的相关区域,以适度的重要态度的人类提供的图像。这些人提供的图像仅需要在数据收集开始时提供一次,并且可以在ofminutes中获取,允许我们缩放地收集不同的数据集,然后可以与任何批处理R1算法组合。我们发现蜜蜂能够在仿真和真正的弗兰卡·埃米卡熊猫机器人上解决基于视觉的愿景的操作任务,并观察到任务 - 不可知论和弱监督勘探技术相比,它(1)互动超过两倍经常使用相关对象,(2)与离线RL一起使用时,提高后续任务性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号