首页> 外文会议>International Conference on Application-specific Systems, Architectures and Processors >Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control
【24h】

Towards Hardware Accelerated Reinforcement Learning for Application-Specific Robotic Control

机译:面向针对特定应用的机器人控制的硬件加速强化学习

获取原文

摘要

Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment based on how good the decisions are and tries to find an optimal decision-making policy that maximises its longterm cumulative reward. This paper presents a novel approach which has showon promise in applying accelerated simulation of RL policy training to automating the control of a real robot arm for specific applications. The approach has two steps. First, design space exploration techniques are developed to enhance performance of an FPGA accelerator for RL policy training based on Trust Region Policy Optimisation (TRPO), which results in a 43% speed improvement over a previous FPGA implementation, while achieving 4.65 times speed up against deep learning libraries running on GPU and 19.29 times speed up against CPU. Second, the trained RL policy is transferred to a real robot arm. Our experiments show that the trained arm can successfully reach to and pick up predefined objects, demonstrating the feasibility of our approach.
机译:强化学习(RL)是机器学习的一个领域,在该领域中,代理通过制定顺序决策与环境进行交互。代理根据决策的好坏从环境中获得奖励,并尝试找到最佳的决策策略以最大化其长期累积奖励。本文提出了一种新颖的方法,该方法在将RL策略训练的加速仿真应用于特定应用的真实机器人手臂的自动化控制方面具有广阔的前景。该方法有两个步骤。首先,开发了设计空间探索技术,以增强基于信任区策略优化(TRPO)的用于RL策略训练的FPGA加速器的性能,与先前的FPGA实施相比,其速度提高了43%,而速度却提高了4.65倍对抗在GPU上运行的深度学习库,并且对抗CPU的速度提高了19.29倍。其次,将训练有素的RL策略转移到真实的机械臂上。我们的实验表明,训练有素的手臂可以成功地到达并拾取预定义的对象,这证明了我们方法的可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号