Learning Deep Robot Controllers by Exploiting Successful and Failed Executions

机译：通过利用成功和失败的执行来学习深层机器人控制器

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The prohibitively amount of data required when learning complex nonlinear policies, such as deep neural networks, has been significantly reduced with guided policy search (GPS) algorithms. However, while learning the control policy, the robot might fail and therefore generate unaccept-able guiding samples. Failures may arise, for example, as a consequence of modeling or environmental uncertainties, and thus unsuccessful interactions should be explicitly considered while learning a complex policy. Currently, GPS methods update the robot policy discarding or giving low probability to unsuccessful trials. In other words, these methods overlook the existence of poorly performing executions, and therefore tend to underestimate the information of these interactions in next iterations. In this paper we propose to learn deep neural network controllers with an extension of GPS that considers trajectories optimized with dualist constraints. These constraints are aimed at assisting the policy learning so that the trajectory distributions updated at each iteration are similar to good trajectory distributions (e.g., sucessful executions) while differing from bad trajectory distributions (e.g. failures). We show that neural network policies guided by trajectories optimized with our method reduce the failures during the policy exploration phase, and therefore encourage safer interactions. This may have a relevant impact in tasks that involve physical contact with the environment or human partners.

机译：学习复杂的非线性策略，如深层神经网络时所需的数据量惊人，已经显著与引导政策搜索（GPS）算法降低。然而，一边学习控制政策，机器人可能会失败，因此生成取消接受，能够引导样品。故障可能出现，例如，造型或环境不确定性的结果，从而成功的互动应该明确的考虑，同时学习一个复杂的政策。目前，GPS方法更新机器人策略丢弃或不成功的试验给予低概率。换句话说，这些方法忽略效果不佳的处决的存在，并且因此往往低估在下一迭代这些相互作用的信息。在本文中，我们提出了学习深层神经网络控制器与GPS的扩展，认为轨迹与二元约束优化。这些限制的目的是帮助而从坏的轨迹分布（如故障）不同的政策学习，使每个更新的轨迹分布迭代类似于好的轨迹分布（例如，SUCESSFUL执行）。我们表明，我们的方法优化轨迹引导神经网络策略在政策勘探阶段降低故障发生率，因此鼓励更安全的相互作用。这可能在涉及与环境或人类伙伴的身体接触任务的相关影响。

著录项

来源
《IEEE-RAS International Conference on Humanoid Robotics》|2018年|572-1149p|共8页
会议地点
作者
Domingo Esteban; Leonel Rozo; Darwin G. Caldwell;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP242-53;
关键词
入库时间 2022-08-20 23:46:35

相似文献

外文文献
中文文献
专利

1. Task execution combined with in-contact obstacle navigation by exploiting torque feedback of sensitive robots [J] . Mohammad Safeea, Pedro Neto, Richard Béarée Procedia Manufacturing . 2020,第186期

机译：任务执行通过利用敏感机器人的扭矩反馈而结合接触障碍物导航
2. Wheelchair robot navigation in different weather conditions using deep learning and evolved neural controller [J] . Khalilullah K. M. Ibrahim, Ota Shunsuke, Yasuda Toshiyuki, Industrial Robot . 2019,第1期

机译：轮椅机器人导航在不同的天气条件下使用深度学习和演变神经控制器
3. Olis Robotics Launches Sales and Distribution Partnership with Forum Energy Technologies,Expanding Global Reach for Machine-Learning ROV Controllers [J] . Platform Oil amp, Gas Technology Review Group Platform Oil & Gas Technology Review . 2019,第5期

机译：Olis Robotics推出了与论坛能源技术的销售和分销合作伙伴关系，扩大了机器学习ROV控制器的全球范围
4. Learning Deep Robot Controllers by Exploiting Successful and Failed Executions [C] . Domingo Esteban, Leonel Rozo, Darwin G. Caldwell IEEE-RAS International Conference on Humanoid Robots . 2018

机译：通过执行成功和失败的执行来学习深度机器人控制器
5. Trajectory Optimization and Machine Learning to Design Feedback Controllers for Bipedal Robots with Provable Stability [D] . Da, Xingye. 2018

机译：轨迹优化和机器学习设计反馈控制器，用于具有可提供可证实稳定性的双模型机器人
6. An Improved Fuzzy Brain Emotional Learning Model Network Controller for Humanoid Robots [O] . Wubing Fang, Fei Chao, Chih-Min Lin, 2019

机译：改进的类人机器人模糊脑情感学习模型网络控制器
7. Image-Based Visual Servoing Controller for Multirotor Aerial Robots Using Deep Reinforcement Learning [O] . Carlos Sampedro, Alejandro Rodriguez-Ramos, Ignacio Gil, 2018

机译：基于图像的Visual Serving Controller，用于使用深度加强学习的多陆空中机器人

Learning Deep Robot Controllers by Exploiting Successful and Failed Executions

摘要

著录项

相似文献

相关主题

期刊订阅