Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

Yutaka Nakamura; Takeshi Mori; Yoichi Tokita; Tomohiro Shibata; Shin Ishii

首页> 外文期刊>Journal of robotics and mechatronics >Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

【24h】

Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

机译：使用CPG控制器的Biped步行的非政策自然政策梯度方法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Referring to the mechanism of animals' rhythmic movements, motor control schemes using a central pattern generator (CPG) controller have been studied. We previously proposed reinforcement learning (RL) called the CPG-actor-critic model, as an autonomous learning framework for a CPG controller. Here, we propose an off-policy natural policy gradient RL algorithm for the CPG-actor-critic model, to solve the "exploration-exploitation" problem by meta-controlling "behavior policy." We apply this RL algorithm to an automatic control problem using a biped robot simulator. Computer simulation demonstrated that the CPG controller enables the biped robot to walk stably and efficiently based on our new algorithm.

机译：关于动物的节奏运动机制，已经研究了使用中央模式发生器（CPG）控制器的运动控制方案。我们之前提出了称为CPG-actor-critic模型的强化学习（RL），作为CPG控制器的自主学习框架。在此，我们为CPG-行为者-批评模型提出了一种政策外的自然政策梯度RL算法，以通过元控制“行为政策”来解决“探索-利用”问题。我们使用双足机器人模拟器将此RL算法应用于自动控制问题。计算机仿真表明，基于我们的新算法，CPG控制器使两足动物机器人能够稳定高效地行走。

著录项

来源
《Journal of robotics and mechatronics》 |2005年第6期|共9页
作者
Yutaka Nakamura; Takeshi Mori; Yoichi Tokita; Tomohiro Shibata; Shin Ishii;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类机器人技术;
关键词
reinforcement learning; off-policy learning; biped walking; central pattern generator (CPG); human modeling in robotics;

机译：强化学习;非政策学习;两足步行;中央模式发生器（CPG）;机器人技术中的人体建模;

相似文献

外文文献
中文文献
专利

1. Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller [J] . Yutaka Nakamura, Takeshi Mori, Yoichi Tokita, Journal of robotics and mechatronics . 2005,第6期

机译：使用CPG控制器的Biped步行的非政策自然政策梯度方法
2. Learning CPG-based biped locomotion with a policy gradient method [J] . Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Robotics and Autonomous Systems . 2006,第11期

机译：使用策略梯度方法学习基于CPG的Biped运动
3. Learning a Dynamic Policy by Using Policy Gradient: Application to Biped Walking [J] . Takamitsu Matsubara, Jun Morimoto, Jun Nakanishi, Systems and Computers in Japan . 2007,第4期

机译：通过使用策略梯度学习动态策略：在Biped步行中的应用
4. Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Critic [C] . Tsuyoshi UENO, Yutaka NAKAMURA, Takashi TAKUMA, IEEE/RSJ International Conference on Intelligent Robots and Systems . 2006

机译：基于禁止政策自然演员的不稳定双层机器人，快速稳定地学习准无源动力行走
5. Natural mode entrainment by CPG-based decentralized feedback controllers. [D] . Futakata, Yoshiaki. 2009

机译：基于CPG的分散反馈控制器进行自然模式跟踪。
6. Implementation of Deep Deterministic Policy Gradients for Controlling Dynamic Bipedal Walking [O] . Chujun Liu, Andrew G. Lonsberry, Mark J. Nandor, 2019

机译：控制动态双足行走的深度确定性策略梯度的实现
7. Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot [O] . Gen Endo, Jun Morimoto, Takamitsu Matsubara, 2008

机译：使用策略梯度方法学习基于CPG的Biped Locomotion：应用于人形机器人

Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅