首页> 外文期刊>Human-Machine Systems, IEEE Transactions on >Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement
【24h】

Human-Like Rewards to Train a Reinforcement Learning Controller for Planar Arm Movement

机译:像人类一样的奖励,用于训练平面手臂运动的强化学习控制器

获取原文
获取原文并翻译 | 示例
       

摘要

High-level spinal cord injury (SCI) in humans causes paralysis below the neck. Functional electrical stimulation (FES) technology applies electrical current to nerves and muscles to restore movement, and controllers for upper extremity FES neuroprostheses calculate stimulation patterns to produce desired arm movement. However, currently available FES controllers have yet to restore natural movements. Reinforcement learning (RL) is a reward-driven control technique; it can employ user-generated rewards, and human preferences can be used in training. To test this concept with FES, we conducted simulation experiments using computer-generated “pseudo-human” rewards. Rewards with varying properties were used with an actor-critic RL controller for a planar two-degree-of-freedom biomechanical human arm model performing reaching movements. Results demonstrate that sparse, delayed pseudo-human rewards permit stable and effective RL controller learning. The frequency of reward is proportional to learning success, and human-scale sparse rewards permit greater learning than exclusively automated rewards. Diversity of training task sets did not affect learning. Long-term stability of trained controllers was observed. Using human-generated rewards to train RL controllers for upper-extremity FES systems may be useful. Our findings represent progress toward achieving human-machine teaming in control of upper-extremity FES systems for more natural arm movements based on human user preferences and RL algorithm learning capabilities.
机译:人体的高级脊髓损伤(SCI)会导致颈以下麻痹。功能性电刺激(FES)技术向神经和肌肉施加电流以恢复运动,上肢FES神经假体的控制器计算刺激模式以产生所需的手臂运动。但是,当前可用的FES控制器尚未恢复自然运动。强化学习(RL)是一种奖励驱动的控制技术。它可以使用用户生成的奖励,并且可以在训练中使用人类的偏爱。为了用FES检验这个概念,我们使用计算机生成的“伪人类”奖励进行了模拟实验。具有不同属性的奖励与行为准则的RL控制器一起使用,用于执行伸手动作的平面两自由度生物机械人手臂模型。结果表明,稀疏的,延迟的伪人类奖赏允许稳定且有效的RL控制器学习。奖励的频率与学习成功成正比,而人类规模的稀疏奖励比完全自动化的奖励具有更多的学习机会。培训任务集的多样性不影响学习。观察到训练有素的控制器的长期稳定性。使用人类产生的奖励来训练上肢FES系统的RL控制器可能会有用。我们的发现代表了在实现基于人机偏好和RL算法学习能力的上肢FES系统的控制以实现更自然的手臂运动方面的人机协作方面的进展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号