首页> 美国卫生研究院文献>Frontiers in Neurorobotics >Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task
【2h】

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

机译:动态模型学习在移动机器人导航任务中线性可解马尔可夫决策过程的评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, ). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.
机译:线性可解马尔可夫决策过程(LMDP)是一类最优控制问题,通过状态值函数(Todorov,)的指数变换,可以将Bellman方程转换为线性方程。在LMDP中,通过使用系统动力学以及作用,状态和最终成本函数的知识来解决离散状态空间中的特征值问题或连续状态中的特征函数问题,从而获得最优值函数和相应的控制策略。 。在这项研究中,我们评估了LMDP框架在实际机器人控制中的有效性,其中必须从经验中学习身体和环境的动力学。我们首先对立杆摆动任务进行仿真研究,以评估学习到的动力学模型的准确性对导出的动作策略的影响。结果表明,尽管总成本较高,但非线性动力学的近似线性近似仍可以解决任务。然后,我们使用Spring Dog移动机器人平台执行一项真正的机器人实验,以完成电池捕获任务。该状态由电池在相机视图中的位置和大小以及两个颈部角度给出。动作是两个轮子的速度,而颈部关节则由视觉伺服控制器控制。我们在具有二次状态和高斯状态成本函数的任务中测试线性和双线性动态模型。在二次成本任务中,从学习的线性动力学模型得出的LMDP控制器与最佳线性二次调节器(LQR)等效执行。在非二次任务中,具有线性动力学模型的LMDP控制器表现出最佳性能。结果表明,即使将简单的线性模型用于动力学学习,LMDP框架在实际机器人控制中的有用性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号