首页> 外文期刊>Industrial Electronics, IEEE Transactions on >Self-Learning Control Using Dual Heuristic Programming with Global Laplacian Eigenmaps
【24h】

Self-Learning Control Using Dual Heuristic Programming with Global Laplacian Eigenmaps

机译:使用具有全局Laplacian特征图的双重启发式编程进行自学习控制

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, to solve nonlinear optimal control problems which can be modeled as Markov decision processes (MDPs), we present an online self-learning control algorithm called dual heuristic programming with global Laplacian Eigenmaps (GLEM-DHP). The GLEM-DHP algorithm makes use of GLEM, which is an improved manifold learning approach with global information, to learn the features for value function approximation of the MDP. Different from traditional feature representation methods using neural networks, the manifold-based features can be learned before the online learning process by collecting samples from the MDP. More importantly, in addition to local features, global information can also be utilized by using the geodesic minimum spanning tree (GMST) approach. Based on the theoretical property of GMST, it is shown that the GLEM-based features can represent the intrinsic geometric property of MDP states, which is beneficial to improve the performance of value function approximation and, hence, leads to better learning control properties. To compare the proposed method with previous learning control algorithms, the performance of GLEM-DHP is evaluated on two nonlinear control problems, which include the cart-pole problem and the ball-plate control problem. Simulation and experimental results show that the GLEM-DHP algorithm can obtain better learning control performance than previous learning control algorithms with manually designed features, as well as manifold features only with local information.
机译:在本文中,为了解决可以建模为马尔可夫决策过程(MDP)的非线性最优控制问题,我们提出了一种在线自学习控制算法,称为全局启发式规划与全局拉普拉斯特征图(GLEM-DHP)。 GLEM-DHP算法利用GLEM(这是一种改进的具有全局信息的流形学习方法)来学习MDP的值函数逼近的特征。与使用神经网络的传统特征表示方法不同,可以在在线学习过程之前通过从MDP收集样本来学习基于流形的特征。更重要的是,除了局部特征外,还可以通过使用测地线最小生成树(GMST)方法来利用全局信息。基于GMST的理论性质,表明基于GLEM的特征可以表示MDP状态的固有几何性质,这有利于提高值函数逼近的性能,从而带来更好的学习控制性质。为了将所提出的方法与以前的学习控制算法进行比较,针对两个非线性控制问题(包括磁极问题和球板控制问题)对GLEM-DHP的性能进行了评估。仿真和实验结果表明,与以前的具有手动设计特征以及仅具有局部信息的特征的学习控制算法相比,GLEM-DHP算法可以获得更好的学习控制性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号