首页> 外文会议>Multi-disciplinary International Conference on Artificial Intelligence >Learning Robot Arm Controls Using Augmented Random Search in Simulated Environments
【24h】

Learning Robot Arm Controls Using Augmented Random Search in Simulated Environments

机译:学习机器人ARM控制在模拟环境中使用增强随机搜索

获取原文

摘要

We investigate the learning of continuous action policy for controlling a six-axes robot arm. Traditional tabular Q-Learning can handle discrete actions well but less so for continuous actions since the tabular approach is constrained by the size of the state-value table. Recent advances in deep reinforcement learning and policy gradient learning abstract the look-up table using function approximators such as artificial neural networks. Artificial neural networks abstract loop-up policy tables as policy networks that can predict discrete actions as well as continuous actions. However, deep reinforcement learning and policy gradient learning were criticized for their complexity. It was reported in recent works that Augmented Random Search (ARS) has a better sample efficiency and a simpler hyper-parameter tuning. This motivates us to apply the technique to our robot-arm reaching tasks. We constructed a custom simulated robot arm environment using Unity Machine Learning Agents game engine, then designed three robot-arm reaching tasks. Twelve models were trained using ARS techniques. Another four models were trained using the state-of-the-art PG learning technique i.e., proximal policy optimization (PPO). Results from models trained using PPO provide a baseline from the policy gradient technique. Empirical results of models trained using ARS and PPO were analyzed and discussed.
机译:我们调查了控制六轴机器人手臂的连续行动政策的学习。传统的表格Q-Learning可以很好地处理离散的动作,但由于表格方法受到状态值表的大小来限制。深增强学习和政策梯度学习摘要摘要使用人工神经网络等函数逼近的查找表。人工神经网络抽象循环策略表作为可以预测离散动作以及连续动作的策略网络。然而,深增强学习和政策梯度学习因其复杂性而受到批评。据报道,最近的作品增强随机搜索(ARS)具有更好的采样效率和更简单的超参数调整。这使我们能够将技术应用于我们的机器人臂到达任务。我们使用Unity机器学习代理游戏引擎构建了定制的模拟机器人臂环境,然后设计了三个机器人臂到达任务。使用ARS技术训练了十二型号。使用最先进的PG学习技术训练另外四种模型,即近端策略优化(PPO)。使用PPO培训的模型的结果提供了来自政策梯度技术的基线。分析并讨论了使用ARS和PPO培训的模型的经验结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号