Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement

Kathleen M. Jagodnik; Philip S. Thomas; Antonie J. van den Bogert; Michael S. Branicky; Robert F. Kirsch

首页> 外文期刊>Human-Machine Systems, IEEE Transactions on >Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement

【24h】

Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement

机译：像人类一样的奖励，用于训练平面手臂运动的强化学习控制器

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

High-level spinal cord injury (SCI) in humans causes paralysis below the neck. Functional electrical stimulation (FES) technology applies electrical current to nerves and muscles to restore movement, and controllers for upper extremity FES neuroprostheses calculate stimulation patterns to produce desired arm movement. However, currently available FES controllers have yet to restore natural movements. Reinforcement learning (RL) is a reward-driven control technique; it can employ user-generated rewards, and human preferences can be used in training. To test this concept with FES, we conducted simulation experiments using computer-generated “pseudo-human” rewards. Rewards with varying properties were used with an actor-critic RL controller for a planar two-degree-of-freedom biomechanical human arm model performing reaching movements. Results demonstrate that sparse, delayed pseudo-human rewards permit stable and effective RL controller learning. The frequency of reward is proportional to learning success, and human-scale sparse rewards permit greater learning than exclusively automated rewards. Diversity of training task sets did not affect learning. Long-term stability of trained controllers was observed. Using human-generated rewards to train RL controllers for upper-extremity FES systems may be useful. Our findings represent progress toward achieving human-machine teaming in control of upper-extremity FES systems for more natural arm movements based on human user preferences and RL algorithm learning capabilities.

机译：人体的高级脊髓损伤（SCI）会导致颈以下麻痹。功能性电刺激（FES）技术向神经和肌肉施加电流以恢复运动，上肢FES神经假体的控制器计算刺激模式以产生所需的手臂运动。但是，当前可用的FES控制器尚未恢复自然运动。强化学习（RL）是一种奖励驱动的控制技术。它可以使用用户生成的奖励，并且可以在训练中使用人类的偏爱。为了用FES检验这个概念，我们使用计算机生成的“伪人类”奖励进行了模拟实验。具有不同属性的奖励与行为准则的RL控制器一起使用，用于执行伸手动作的平面两自由度生物机械人手臂模型。结果表明，稀疏的，延迟的伪人类奖赏允许稳定且有效的RL控制器学习。奖励的频率与学习成功成正比，而人类规模的稀疏奖励比完全自动化的奖励具有更多的学习机会。培训任务集的多样性不影响学习。观察到训练有素的控制器的长期稳定性。使用人类产生的奖励来训练上肢FES系统的RL控制器可能会有用。我们的发现代表了在实现基于人机偏好和RL算法学习能力的上肢FES系统的控制以实现更自然的手臂运动方面的人机协作方面的进展。

著录项

来源
《Human-Machine Systems, IEEE Transactions on》 |2016年第5期|723-733|共11页
作者
Kathleen M. Jagodnik; Philip S. Thomas; Antonie J. van den Bogert; Michael S. Branicky; Robert F. Kirsch;
展开▼
作者单位

Fluid Physics and Transport Processes Branch, NASA Glenn Research Center, Cleveland, OH, USA;

Brunskill Laboratory, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA;

Department of Mechanical Engineering, Cleveland State University, Cleveland, OH, USA;

School of Engineering, University of Kansas, Lawrence, KS, USA;

Department of Biomedical Engineering, Case Western Reserve University, Cleveland, OH, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Muscles; Biological system modeling; Learning (artificial intelligence); Training; Computational modeling; Biomechanics; Physiology;

机译：肌肉;生物系统建模;学习（人工智能）;培训;计算建模;生物力学;生理学;
入库时间 2022-08-18 01:15:49

相似文献

外文文献
中文文献
专利

1. Training an Actor-Critic Reinforcement Learning Controller for Arm Movement Using Human-Generated Rewards [J] . Kathleen M. Jagodnik, Philip S. Thomas, Antonie J. van den Bogert, IEEE transactions on neural systems and rehabilitation engineering . 2017,第10期

机译：使用人类产生的奖励训练演员关键性强化学习控制员进行手臂运动
2. Tuning hydrostatic two-output drive-train controllers using reinforcement learning [J] . Van Vaerenbergh Kevin, Vrancx Peter, De Hauwere Yann-Michael, Mechatronics: The Science of Intelligent Machines . 2014,第8期

机译：使用强化学习调整静液压两输出传动系统控制器
3. Iterative Learning without Reinforcement or Reward for Multijoint Movements: A Revisit of Bernstein's DOF Problem on Dexterity [J] . SuguruArimoto, MasahiroSekimoto, KenjiTahara Journal of robotics . 2010,第1期

机译：无需强化或奖励多关节运动的迭代学习：对伯恩斯坦关于敏捷性的自由度问题的回顾
4. A STUDY OF REWARD FUNCTIONS IN REINFORCEMENT LEARNING ON A DYNAMIC MODEL OF A TWO-LINK PLANAR ROBOT [C] . DENZINGER Joachim, LAUREYNS Isabelle, FRIETSCH Markus, DAAAM International Symposium on Intelligent Manufacturing and Automation . 2017

机译：两连杆平面机器人动态模型奖励功能研究
5. Learning Policies for Model-Based Reinforcement Learning Using Distributed Reward Formulation [D] . Agarwal, Nikhil. 2021

机译：使用分布式奖励制定学习基于模型的强化学习的政策
6. Optimization and evaluation of a proportional derivative controller for planar arm movement [O] . Kathleen M. Jagodnik, Antonie J. van den Bogert -1

机译：平面臂运动比例衍生控制器的优化与评价
7. Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement [O] . Kathleen M. Jagodnik, Philip S. Thomas, Antonie J. van den Bogert, 2016

机译：人类奖励为平面臂运动训练加强学习控制器
8. Framing Reinforcement Learning from Human Reward: Reward Positivity, Temporal Discounting, Episodicity, and Performance. [R] . Knox, W. B., Stone, P. 2014

机译：从人类奖励中学习强化学习：奖励积极性，时间贴现，情节性和表现。

Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement

摘要

著录项

相似文献

相关主题

期刊订阅