Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)

机译：用贝叶斯增强的深度确定性策略梯度（DDPG）开发类似于-的运动步态

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a reinforcement learning method is presented to generate earthworm-like gaits for a hyperredundant earthworm-like manipulator robot. Partially inspired by human brain’s learning mechanism, the proposed learning framework builds its preliminary belief by first starting with adapting rudimentary gaits governed by a generic kinematic knowledge of undulatory, sidewinding and circular patterns. The preliminary belief is then represented as a prior ensemble to learn new gaits by leveraging apriori knowledge and learning a policy by inferring posterior over prior distribution. While the fundamental idea of incorporating Bayesian learning with reinforcement learning is not new, this paper extends Bayesian actor-critic approach by introducing augmented prior-based directed bias in policy search, aiding in faster parameter learning and reduced sampling requirements. We show results on an in-house built 10-DoF earthworm-like robot that exhibits adaptive development, qualitatively learning different locomotion modes, while given with only rudimentary generic gait behaviors. The results are compared against deterministic policy gradient method (DDPG) for continuous control as the baseline. We show that our proposed method can characterize effective performance over DDPG, and it also achieves faster kinematic indexes in various gaits.

机译：本文提出了一种增强学习方法，用于为超冗余的类-机械手机器人生成类like步态。拟议的学习框架在一定程度上受到人脑学习机制的启发，首先从适应起伏的基本步态开始建立其初步信念，步态的基本步态受波动，回旋和圆形模式的一般运动学知识的支配。然后，将初始信念表示为先验集合，以利用先验知识来学习新步态，并通过推断先验分布后验来学习策略。尽管将贝叶斯学习与强化学习相结合的基本思想并不是什么新鲜事，但本文通过在策略搜索中引入基于先验的增强定向偏见来扩展贝叶斯行为者批评方法，从而有助于更快的参数学习和减少的采样需求。我们在一个内部构建的10自由度类似earth的机器人上展示了结果，该机器人展现了自适应的发展，定性地学习了不同的运动模式，同时仅给出了基本的通用步态行为。将结果与确定性策略梯度法（DDPG）进行比较，以连续控制为基准。我们表明，我们提出的方法可以表征超过DDPG的有效性能，并且还可以在各种步态中实现更快的运动学指标。

著录项

来源
《IEEE International Conference on Automation Science and Engineering》|2020年|1122-1128|共7页
会议地点
作者
Sayyed Jaffar Ali Raza; Apan Dastider; Mingjie Lin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Bayes methods; Robots; Trajectory; Task analysis; Solid modeling; Adaptation models; Standards;

机译：贝叶斯方法;机器人;轨迹;任务分析;实体建模;适应模型;标准;

相似文献

外文文献
中文文献
专利

1. Deep Deterministic Policy Gradient (DDPG)-Based Resource Allocation Scheme for NOMA Vehicular Communications [J] . Xu Yi-Han, Yang Cheng-Cheng, Hua Min, Quality Control, Transactions . 2020,第期

机译：基于NOMA车辆通信的深度确定性政策梯度（DDPG）基于资源分配方案
2. Deep Deterministic Policy Gradient (DDPG)-Based Energy Harvesting Wireless Communications [J] . Qiu Chengrun, Hu Yang, Chen Yan, Internet of Things Journal, IEEE . 2019,第5期

机译：基于深度确定性策略梯度（DDPG）的能量收集无线通信
3. Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm [J] . Junta Wu, Huiyun Li Mathematical Problems in Engineering: Theory, Methods and Applications . 2020,第1期

机译：具有多种深度确定性政策梯度算法的深度集成钢筋学习
4. Synthesized Prioritized Data Pruning based Deep Deterministic Policy Gradient Algorithm Improvement [C] . Hui Xiang, Jun Cheng, Qieshi Zhang, IEEE International Conference on Information and Automation . 2018

机译：基于综合优先数据修剪的深度确定性策略梯度算法改进
5. Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking [O] . Chujun Liu, Andrew G. Lonsberry, Mark J. Nandor, 2019

机译：控制动态双足行走的深度确定性策略梯度的实现
6. Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge [O] . Doo Re Song, Chuanyu Yang, Christopher McGreavy, 2018

机译：粗糙地形挑战中双模运动的反复确定性政策梯度方法

Developmentally Synthesizing Earthworm-Like Locomotion Gaits with Bayesian-Augmented Deep Deterministic Policy Gradients (DDPG)

摘要

著录项

相似文献

相关主题

期刊订阅