首页> 美国卫生研究院文献>Frontiers in Neurorobotics >Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

【2h】

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

机译：动态模型学习在移动机器人导航任务中线性可解马尔可夫决策过程的评估

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Linearly solvable Markov Decision Process (LMDP) is a class of optimal control problem in which the Bellman's equation can be converted into a linear equation by an exponential transformation of the state value function (Todorov, ). In an LMDP, the optimal value function and the corresponding control policy are obtained by solving an eigenvalue problem in a discrete state space or an eigenfunction problem in a continuous state using the knowledge of the system dynamics and the action, state, and terminal cost functions. In this study, we evaluate the effectiveness of the LMDP framework in real robot control, in which the dynamics of the body and the environment have to be learned from experience. We first perform a simulation study of a pole swing-up task to evaluate the effect of the accuracy of the learned dynamics model on the derived the action policy. The result shows that a crude linear approximation of the non-linear dynamics can still allow solution of the task, despite with a higher total cost. We then perform real robot experiments of a battery-catching task using our Spring Dog mobile robot platform. The state is given by the position and the size of a battery in its camera view and two neck joint angles. The action is the velocities of two wheels, while the neck joints were controlled by a visual servo controller. We test linear and bilinear dynamic models in tasks with quadratic and Guassian state cost functions. In the quadratic cost task, the LMDP controller derived from a learned linear dynamics model performed equivalently with the optimal linear quadratic regulator (LQR). In the non-quadratic task, the LMDP controller with a linear dynamics model showed the best performance. The results demonstrate the usefulness of the LMDP framework in real robot control even when simple linear models are used for dynamics learning.

机译：线性可解马尔可夫决策过程（LMDP）是一类最优控制问题，通过状态值函数（Todorov，）的指数变换，可以将Bellman方程转换为线性方程。在LMDP中，通过使用系统动力学以及作用，状态和最终成本函数的知识来解决离散状态空间中的特征值问题或连续状态中的特征函数问题，从而获得最优值函数和相应的控制策略。。在这项研究中，我们评估了LMDP框架在实际机器人控制中的有效性，其中必须从经验中学习身体和环境的动力学。我们首先对立杆摆动任务进行仿真研究，以评估学习到的动力学模型的准确性对导出的动作策略的影响。结果表明，尽管总成本较高，但非线性动力学的近似线性近似仍可以解决任务。然后，我们使用Spring Dog移动机器人平台执行一项真正的机器人实验，以完成电池捕获任务。该状态由电池在相机视图中的位置和大小以及两个颈部角度给出。动作是两个轮子的速度，而颈部关节则由视觉伺服控制器控制。我们在具有二次状态和高斯状态成本函数的任务中测试线性和双线性动态模型。在二次成本任务中，从学习的线性动力学模型得出的LMDP控制器与最佳线性二次调节器（LQR）等效执行。在非二次任务中，具有线性动力学模型的LMDP控制器表现出最佳性能。结果表明，即使将简单的线性模型用于动力学学习，LMDP框架在实际机器人控制中的有用性。

著录项

期刊名称 Frontiers in Neurorobotics
作者
Ken Kinjo; Eiji Uchibe; Kenji Doya;
展开▼
作者单位

展开▼
年(卷),期 2013(7),-1
年度 2013
页码 7
总页数 13
原文格式 PDF
正文语种
中图分类情报学;
关键词
optimal control linearly solvable Markov decision process model-based reinforcement learning model learning robot navigation;

机译：最优控制;线性可解马尔可夫决策过程;基于模型的强化学习;模型学习;机器人导航;

相似文献

外文文献
中文文献
专利

1. Estimating Passive Dynamics Distributions and State Costs in Linearly Solvable Markov Decision Processes during Z Learning Execution [J] . Mauricio BURDELIS, Kazushi IKEDA SICE Journal of Control, Measurement, and System Integration (SICE JCMSI) . 2014,第1期

机译：Z学习执行过程中线性可解马尔可夫决策过程中的被动动力学分布和状态成本估计
2. Modeling and estimating passive dynamics distributions in linearly solvable Markov decision processes [J] . Mauricio BURDELIS, Kazushi IKEDA 電子情報通信学会技術研究報告. ニュ-ロコンピュ-ティング. Neurocomputing . 2011,第157期

机译：线性可解马尔可夫决策过程中的被动动力学分布建模和估计
3. Modeling and estimating passive dynamics distributions in linearly solvable Markov decision processes [J] . Mauricio BURDELIS, Kazushi IKEDA 電子情報通信学会技術研究報告 . 2011,第157期

机译：线性可解马尔可夫决策过程中的被动动力学分布建模和估计
4. Learning hierarchical observable Markov decision process models for robot navigation [C] . Theocharous, G., Rohanimanesh, Robotics and Automation, 2001. Proceedings 2001 ICRA. IEEE International Conference on . 2001

机译：学习用于机器人导航的分层可观察马尔可夫决策过程模型
5. Navigation, Path Planning, and Task Allocation Framework For Mobile Co-Robotic Service Applications in Indoor Building Environments [D] . Mantha, Bharadwaj. 2018

机译：室内建筑环境中移动协同服务应用的导航，路径规划和任务分配框架
6. Localization of Non-Linearly Modeled Autonomous Mobile Robots Using Out-of-Sequence Measurements [O] . Eva Besada-Portas, Jose A. Lopez-Orozco, Pablo Lanillos, 2012

机译：使用无序测量对非线性建模的自主移动机器人进行定位
7. Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task [O] . Ken eKinjo, Eiji eUchibe, Kenji eDoya 2013

机译：动态模型学习在移动机器人导航任务中的线性可解马尔可夫决策过程评估

Evaluation of linearly solvable Markov decision process with dynamic model learning in a mobile robot navigation task

摘要

著录项

相似文献

相关主题

期刊订阅