首页> 外文会议>IEEE International Conference on Automation Science and Engineering >Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)
【24h】

Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)

机译:用贝叶斯增强的深度确定性策略梯度(DDPG)开发类似于-的运动步态

获取原文

摘要

In this paper, a reinforcement learning method is presented to generate earthworm-like gaits for a hyperredundant earthworm-like manipulator robot. Partially inspired by human brain’s learning mechanism, the proposed learning framework builds its preliminary belief by first starting with adapting rudimentary gaits governed by a generic kinematic knowledge of undulatory, sidewinding and circular patterns. The preliminary belief is then represented as a prior ensemble to learn new gaits by leveraging apriori knowledge and learning a policy by inferring posterior over prior distribution. While the fundamental idea of incorporating Bayesian learning with reinforcement learning is not new, this paper extends Bayesian actor-critic approach by introducing augmented prior-based directed bias in policy search, aiding in faster parameter learning and reduced sampling requirements. We show results on an in-house built 10-DoF earthworm-like robot that exhibits adaptive development, qualitatively learning different locomotion modes, while given with only rudimentary generic gait behaviors. The results are compared against deterministic policy gradient method (DDPG) for continuous control as the baseline. We show that our proposed method can characterize effective performance over DDPG, and it also achieves faster kinematic indexes in various gaits.
机译:本文提出了一种增强学习方法,用于为超冗余的类-机械手机器人生成类like步态。拟议的学习框架在一定程度上受到人脑学习机制的启发,首先从适应起伏的基本步态开始建立其初步信念,步态的基本步态受波动,回旋和圆形模式的一般运动学知识的支配。然后,将初始信念表示为先验集合,以利用先验知识来学习新步态,并通过推断先验分布后验来学习策略。尽管将贝叶斯学习与强化学习相结合的基本思想并不是什么新鲜事,但本文通过在策略搜索中引入基于先验的增强定向偏见来扩展贝叶斯行为者批评方法,从而有助于更快的参数学习和减少的采样需求。我们在一个内部构建的10自由度类似earth的机器人上展示了结果,该机器人展现了自适应的发展,定性地学习了不同的运动模式,同时仅给出了基本的通用步态行为。将结果与确定性策略梯度法(DDPG)进行比较,以连续控制为基准。我们表明,我们提出的方法可以表征超过DDPG的有效性能,并且还可以在各种步态中实现更快的运动学指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号