Policy transfer via modularity and reward guiding

机译：通过模块化和奖励引导政策转移

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Non-prehensile manipulation, such as pushing, is an important function for robots to move objects and is sometimes preferred as an alternative to grasping. However, due to unknown frictional forces, pushing has been proven a difficult task for robots. We explore the use of reinforcement learning to train a robot to robustly push an object. In order to deal with the sample complexity of training such a method, we train the pushing policy in simulation and then transfer this policy to the real world. In order to ease the transfer from simulation, we propose to use modularity to separate the learned policy from the raw inputs and outputs; rather than training "end-to-end," we decompose our system into modules and train only a subset of these modules in simulation. We further demonstrate that we can incorporate prior knowledge about the task into the state space and the reward function to speed up convergence. Finally, we introduce "reward guiding" to modify the reward function and further reduce the training time. We demonstrate, in both simulation and real-world experiments, that such an approach can be used to reliably push an object from many initial positions and orientations. Videos available at https://goo.gl/B7LtY3.

机译：非预先生操纵，例如推动，是移动物体的重要功能，并且有时优选作为抓握的替代品。然而，由于未知的摩擦力，推动已经证明了机器人的艰巨任务。我们探索使用加强学习来训练机器人强大地推动物体。为了处理这种方法的培训的样本复杂性，我们培养推动仿真政策，然后将此政策转移到现实世界。为了缓解模拟的转移，我们建议使用模块化来将学习策略与原始输入和输出分开;而不是培训“端到端”，我们将我们的系统分解为模块并仅在模拟中列出这些模块的子集。我们进一步证明我们可以将关于任务的先验知识纳入状态空间，并奖励函数来加速收敛。最后，我们介绍“奖励指导”来修改奖励功能，并进一步减少培训时间。我们在模拟和实际实验中展示，这种方法可用于可靠地从许多初始位置和方向推动物体。在https://goo.gl/b7lty3上提供的视频。

著录项

来源
《IEEE/RSJ International Conference on Intelligent Robots and Systems》|2017年|p1201-1964|共8页
会议地点
作者
Ignasi Clavera; David Held; Pieter Abbeel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TB904.1-53;
关键词

相似文献

外文文献
中文文献
专利

1. Pushed or pulled? transfer of reward management policies in MNCs [J] . Sayim K.Z. The international journal of human resource management . 2010,第13a15期

机译：推还是拉？跨国公司奖励管理政策的转移
2. Pushed or pulled? Transfer of reward management policies in MNCs [J] . Kadire Zeynep SayÄ±ma* The International Journal of Human Resource Management . 2010,第14期

机译：推还是拉？跨国公司奖励管理政策的转移
3. Critical role for the mediodorsal thalamus in permitting rapid reward-guided updating in stochastic reward environments [J] . Subhojit Chakraborty, Nils Kolling, Mark E Walton, eLife journal . 2016,第2期

机译：中间丘脑在随机奖励环境中允许快速奖励指导更新中的关键作用
4. Policy transfer via modularity and reward guiding [C] . Ignasi Clavera, David Held, Pieter Abbeel IEEE/RSJ International Conference on Intelligent Robots and Systems . 2017

机译：通过模块化和奖励指导进行政策转移
5. Designing Modular Synthetic Metabolons Via dCAS9-Guided Assembly and Protein Nanocages [D] . ?Berckman, Emily 2020

机译：通过DCAS9引导组件和蛋白质纳米植物设计模块化合成代谢物
6. Reward Guides Vision when Its Your Thing: Trait Reward-Seeking in Reward-Mediated Visual Priming [O] . Clayton Hickey, Leonardo Chelazzi, Jan Theeuwes 2010

机译：奖励随您而行引导视觉：奖励中介视觉启动中的特质奖励
7. Dopamine reports reward prediction errors, but does not update policy, during inference-guided choice [O] . Marta Blanco-Pozo, Thomas Akam, Mark Walton 2021

机译：多巴胺报告奖励预测错误，但不更新策略，在推理引导的选择期间
8. Modular Missile Technologies (MMT): A Modular Open Architecture Approach for Guided Missiles. [R] . Lofts, C. S. 2015

机译：模块化导弹技术（mmT）：用于导弹的模块化开放式架构方法。

Policy transfer via modularity and reward guiding

摘要

著录项

相似文献

相关主题

期刊订阅