Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity

SuguruArimoto; MasahiroSekimoto; KenjiTahara

首页> 外文期刊>Journal of robotics >Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity

【24h】

Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity

机译：无需强化或奖励多关节运动的迭代学习：对伯恩斯坦关于敏捷性的自由度问题的回顾

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A robot designed to mimic a human becomes kinematically redundant and its total degrees of freedom becomes larger than the number of physical variables required for describing a given task. Kinematic redundancy may contribute to enhancement of dexterity and versatility but it incurs a problem of ill-posedness of inverse kinematics from the taskspace to the joint space. This ill-posedness was originally found by Bernstein, who tried to unveil the secret of thecentral nervous system and how nicely it coordinates a skeletomotor system with many DOFs interacting in complex ways. Inthe history of robotics research, such ill-posedness has not yet been resolved directly but circumvented by introducingan artificial performance index and determining uniquely an inverse kinematics solution by minimization. This paper tacklessuch Bernstein's problem and proposes a new method for resolving the ill-posedness in a natural way without invokingany artificial index. First, given a curve on a horizontal plane for a redundant robot arm whose endpoint is imposed to tracethe curve, the existence of a unique ideal joint trajectory is proved. Second, such a uniquely determined motion can beacquired eventually as a joint control signal through iterative learning without reinforcement or reward.

机译：设计用来模仿人类的机器人在运动学上变得多余，并且其总自由度变得大于描述给定任务所需的物理变量的数量。运动学冗余可能有助于提高灵活性和多功能性，但它会引起从任务空间到关节空间的逆运动学不适定的问题。这种不适状况最初是由伯恩斯坦发现的，他试图揭示中枢神经系统的秘密，以及它如何很好地协调骨骼运动系统与许多自由度以复杂方式相互作用的情况。在机器人技术研究的历史中，这种不适定性尚未得到直接解决，而是通过引入人工性能指标并通过最小化来唯一确定逆运动学解决方案来加以解决。本文解决了这样的伯恩斯坦问题，并提出了一种新方法，以自然方式解决不适定性，而无需调用任何人工指标。首先，给定冗余机器人手臂的水平面曲线，其端点被施加以跟踪曲线，证明了唯一的理想关节轨迹的存在。其次，最终可以通过迭代学习最终获得这种唯一确定的运动作为关节控制信号，而无需增强或奖励。

著录项

来源
《Journal of robotics》 |2010年第1期|共15页
作者
SuguruArimoto; MasahiroSekimoto; KenjiTahara;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Reinforcement Q-learning based on Multirate Generalized Policy Iteration and Its Application to a 2-DOF Helicopter [J] . Tae Yoon Chun, Jin Bae Park, Yoon Ho Choi International Journal of Control, Automation, and Systems . 2018,第1期

机译：基于多型广义政策迭代的加固Q学习及其在2-DOF直升机的应用
2. A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis [J] . ABHIJIT GOSAVI Machine Learning . 2004,第1期

机译：一种基于策略迭代的平均奖励强化学习算法：收益管理与收敛性分析的实证结果
3. Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards [J] . Kathleen M. Jagodnik, Philip S. Thomas, Antonie J. van den Bogert, IEEE transactions on neural systems and rehabilitation engineering . 2017,第10期

机译：使用人类产生的奖励训练演员关键性强化学习控制员进行手臂运动
4. Human-like Movements of Robotic Arms with Redundant DOFs: Virtual Spring-Damper Hypothesis to Tackle the Bernstein Problem [C] . Suguru Arimoto, Masahiro Sekimoto IEEE International Conference on Robotics and Automation . 2006

机译：具有冗余DOF的机器人臂的人类运动：虚拟春天阻碍假设来解决伯尔尼斯坦问题
5. Reinforcement learning without rewards. [D] . Syed, Umar Ali. 2010

机译：没有奖励的强化学习。
6. GadgetArm—Automatic Grasp Generation and Manipulation of 4-DOF Robot Arm for Arbitrary Objects Through Reinforcement Learning [O] . JoungMin Park, SangYoon Lee, JaeWoon Lee, 2020

机译：Gadgetarm-自动掌握4-DOF机器人手臂通过加固学习进行任意物体的生成和操纵
7. Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity [O] . Suguru Arimoto, Masahiro Sekimoto, Kenji Tahara 2010

机译：没有加强或奖励的迭代学习，对多个运动的转变：伯恩斯坦对灵巧性的反转问题
8. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance. [R] . Knox, W. B., Stone, P. 2014

机译：从人类奖励中学习强化学习：奖励积极性，时间贴现，情节性和表现。

Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity

摘要

著录项

相似文献

相关主题

期刊订阅