首页> 外文会议>2019 International Conference on Robotics and Automation >Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach
【24h】

Continuous Control for High-Dimensional State Spaces: An Interactive Learning Approach

机译:高维状态空间的连续控制:一种交互式学习方法

获取原文
获取原文并翻译 | 示例

摘要

Deep Reinforcement Learning (DRL) has become a powerful methodology to solve complex decision-making problems. However, DRL has several limitations when used in real-world problems (e.g., robotics applications). For instance, long training times are required and cannot be accelerated in contrast to simulated environments, and reward functions may be hard to specify/model and/or to compute. Moreover, the transfer of policies learned in a simulator to the real-world has limitations (reality gap). On the other hand, machine learning methods that rely on the transfer of human knowledge to an agent have shown to be time efficient for obtaining well performing policies and do not require a reward function. In this context, we analyze the use of human corrective feedback during task execution to learn policies with high-dimensional state spaces, by using the D-COACH framework, and we propose new variants of this framework. D-COACH is a Deep Learning based extension of COACH (COrrective Advice Communicated by Humans), where humans are able to shape policies through corrective advice. The enhanced version of DCOACH, which is proposed in this paper, largely reduces the time and effort of a human for training a policy. Experimental results validate the efficiency of the D-COACH framework in three different problems (simulated and with real robots), and show that its enhanced version reduces the human training effort considerably, and makes it feasible to learn policies within periods of time in which a DRL agent do not reach any improvement.
机译:深度强化学习(DRL)已成为解决复杂决策问题的强大方法。但是,DRL在实际问题(例如机器人应用程序)中使用时有一些限制。例如,与模拟环境相比,需要较长的训练时间并且不能加快训练时间,并且奖励功能可能难以指定/建模和/或计算。此外,在模拟器中学习到的策略向真实世界的转移具有局限性(现实差距)。另一方面,依赖于将人类知识转移给代理的机器学习方法已显示出时间高效,可以有效地执行策略,并且不需要奖励功能。在这种情况下,我们通过使用D-COACH框架分析了任务执行过程中人类纠正反馈的使用,以学习具有高维状态空间的策略,并提出了该框架的新变体。 D-COACH是基于深度学习的COACH(人类交流的纠正性建议)的扩展,其中人类可以通过纠正性建议来制定政策。本文提出的DCOACH的增强版本大大减少了人员培训策略的时间和精力。实验结果验证了D-COACH框架在三个不同问题(模拟的和真实的机器人)中的效率,并表明其增强版本大大减少了人工训练的工作量,并使得在一段时间内学习策略变得可行。 DRL代理没有任何改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号