【24h】

Inverse discounted-based LQR algorithm for learning human movement behaviors

机译:基于贴现折扣的LQR算法,用于学习人类运动行为

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Recently, there has been an increasing interest towards understanding human movement behaviors. In this regard, one of the approaches is to retrieve the unknown underlying objective function that the human has to optimize while achieving a certain movement behavior. Existing research of behavioral understanding merely depends on predefined optimality criteria, where the minimum time, minimum variance or/and minimum effort are mainly adopted. These criteria are assumed to be constant, where the human is assumed to have the same preferences during the movement duration. However, in this paper, the optimality criteria underlying the kinematic characteristics of a certain human behavior are assumed to be exponentially discounted to account for the change in the human preferences that could happen while achieving this behavior. A new Inverse Discounted-based Linear Quadratic Regulator (ID-LQR) algorithm is developed in the light of Inverse Optimal Control (IOC) framework to find out the discounted cost function that could reproduce the measured human behavior perfectly. Meanwhile, an Incremental version of the ID-LQR algorithm is proposed to continuously refine the so far learned cost function in the case of sequentially presented demonstrations. The saccadic eye gaze movement is studied as an example to quantify both the proposed ID-LQR and Inverse ID-LQR approaches. Simulation results are encouraging and show that the saccadic trajectories generated by ID-LQR approach match the experimental data in many aspects, including position and velocity profiles of saccades. Moreover, when it is assessed by a subsequent set of scenarios, the incremental ID-LQR algorithm confirms its capability to generalize the so far retrieved cost function for the unseen saccadic demonstrations.
机译:最近,对理解人类运动行为越来越兴趣。在这方面,其中一种方法是检索人类必须在实现某种运动行为的同时优化的未知潜在的目标函数。行为理解的现有研究仅仅取决于预定义的最优性标准,其中主要采用最短时间,最小方差或/和最小努力。假设这些标准是恒定的,其中假设人在运动持续时间期间具有相同的偏好。然而,在本文中,假设某种人类行为的运动特征的最优标准被指数折扣,以考虑在实现这种行为的同时可能发生的人类偏好的变化。根据反向最佳控制(IOC)框架的光线开发了一种新的基于贴现基于贴现的线性二次调节器(ID-LQR)算法,以找出可以完美再现测量人类行为的折扣成本函数。同时,提出了一种ID-LQR算法的增量版本,以在顺序呈现演示的情况下连续地改进到目前为止的学习成本函数。研究了扫视眼凝视运动,以定量所提出的ID-LQR和逆ID-LQR方法的示例。仿真结果令人鼓舞并表明ID-LQR方法产生的扫视轨迹在许多方面匹配实验数据,包括扫视的位置和速度谱。此外,当通过随后的方案进行评估时,增量ID-LQR算法确认其能力概括了未检索到的未检测到的扫视展示的成本函数。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号