Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

Nassour J.; Hugel V.; Ouezdou F. B.; Cheng G.

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

【24h】

Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

机译：具有成功失败图的定性自适应奖励学习：应用于人形机器人行走

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the human brain, rewards are encoded in a flexible and adaptive way after each novel stimulus. Neurons of the orbitofrontal cortex are the key reward structure of the brain. Neurobiological studies show that the anterior cingulate cortex of the brain is primarily responsible for avoiding repeated mistakes. According to vigilance threshold, which denotes the tolerance to risks, we can differentiate between a learning mechanism that takes risks and one that averts risks. The tolerance to risk plays an important role in such a learning mechanism. Results have shown the differences in learning capacity between risk-taking and risk-avert behaviors. These neurological properties provide promising inspirations for robot learning based on rewards. In this paper, we propose a learning mechanism that is able to learn from negative and positive feedback with reward coding adaptively. It is composed of two phases: evaluation and decision making. In the evaluation phase, we use a Kohonen self-organizing map technique to represent success and failure. Decision making is based on an early warning mechanism that enables avoiding repeating past mistakes. The behavior to risk is modulated in order to gain experiences for success and for failure. Success map is learned with adaptive reward that qualifies the learned task in order to optimize the efficiency. Our approach is presented with an implementation on the NAO humanoid robot, controlled by a bioinspired neural controller based on a central pattern generator. The learning system adapts the oscillation frequency and the motor neuron gain in pitch and roll in order to walk on flat and sloped terrain, and to switch between them.

机译：在人脑中，每次新颖刺激后，奖励就会以灵活和自适应的方式进行编码。眶额皮质的神经元是大脑的关键奖励结构。神经生物学研究表明，大脑的前扣带回皮层是避免重复错误的主要原因。根据表示风险承受能力的警戒阈值，我们可以区分学习风险的机制和避免风险的机制。风险承受能力在这种学习机制中起着重要作用。结果表明，冒险行为和规避风险行为的学习能力存在差异。这些神经学特性为基于奖励的机器人学习提供了有希望的启发。在本文中，我们提出了一种学习机制，该机制能够通过奖励编码自适应地从负反馈和正反馈中学习。它由两个阶段组成：评估和决策。在评估阶段，我们使用Kohonen自组织映射技术来表示成功和失败。决策基于预警机制，可以避免重蹈覆辙。为了获得成功和失败的经验，对冒险行为进行了调整。通过自适应奖励来学习成功图，该奖励使所学习的任务合格，从而优化效率。我们的方法是在NAO人形机器人上实现的，该机器人由基于中央模式生成器的生物启发式神经控制器控制。学习系统可以在俯仰和横滚中调整振荡频率和运动神经元增益，以便在平坦和倾斜的地形上行走并在它们之间进行切换。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2013年第1期|p.81-93|共13页
作者
Nassour J.; Hugel V.; Ouezdou F. B.; Cheng G.;
展开▼
作者单位

Institute for Cognitive Systems, Technical University of Munich, Munich, Germany;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Humanoid robots; Humans; Learning systems; Legged locomotion; Neurons; Vectors; Experience-based learning mechanism; humanoid learning; humanoid robot walking; neurorobotics;

机译：人形机器人人类;学习系统;腿运动;神经元;向量;基于经验的学习机制;人形学习;人形机器人行走;神经机器人学;

相似文献

外文文献
中文文献
专利

1. Matsuoka’s CPG With Desired Rhythmic Signals for Adaptive Walking of Humanoid Robots [J] . Wang Yong, Xue Xihui, Chen Baifan Cybernetics, IEEE Transactions on . 2020,第2期

机译：Matsuoka的CPG具有所需的节奏信号，用于自适应人类机器人的行走
2. Multi-level cognitive machine-learning based concept for human-like "artificial" walking: Application to autonomous stroll of humanoid robots [J] . Kurosh Madani, Christophe Sabourin Neurocomputing . 2011,第8期

机译：基于多级认知机器学习的类人“人工”行走概念：在类人机器人自主漫步中的应用
3. WALKING PATTERN MAPPING FROM IMPERFECT MOTION CAPTURE DATA ONTO BIPED HUMANOID ROBOTS [J] . JUNG-YUP KIM, YOUNG-SEOG KIM International journal of humanoid robotics . 2010,第1期

机译：从不完美的运动捕捉数据到人形机器人上的行走模式映射
4. Learning of Robotic Throwing at a Target using a Qualitative Learning Reward [C] . Zvezdan Lončarević, Rok Pahić, Mihael Simonić, IEEE International Conference on Smart Technologies . 2019

机译：使用定性学习奖励学习向目标投掷机器人
5. Adaptive dynamic walking and motion optimization for humanoid robots [D] . Seekircher, Andreas 2015

机译：类人机器人的自适应动态行走和运动优化
6. Omnidirectional Walking Pattern Generator Combining Virtual Constraints and Preview Control for Humanoid Robots [O] . Francesco Ruscelli, Arturo Laurenzi, Enrico Mingo Hoffman, 2021

机译：全向行走模式发生器组合虚拟约束和人形机器人预览控制
7. Tactile sensing by the sole of the foot: part I: apparatus and initial experiments toward obtaining dynamic pressure maps useful for stabilizing standing, walking and running of humanoid robots [O] . Abhinav Kalamdani, Chris Messom, Mel Siegel 2006

机译：脚底的触觉传感：第一部分：用于获得动态压力图的设备和初步实验，可用于稳定人形机器人的站立，行走和奔跑

Qualitative Adaptive Reward Learning With Success Failure Maps: Applied to Humanoid Robot Walking

摘要

著录项

相似文献

相关主题

期刊订阅