...
首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems
【24h】

Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

机译:off-policy交错<内联 - 公式> $ q $ -learning:仿射非线性离散时间系统的最佳控制

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.
机译:在本文中,提出了一种新的截止型Q学习算法,用于解决沿系统轨迹的测量数据的仿射非线性离散时间(DT)系统的最佳控制问题。仿射系统,未知动态和脱策学习方法的非线性特征在近似最佳控制器上提出了巨大挑战。为此,首先对仿射非线性DT系统进行了最佳控制的政策Q学习方法,并且其收敛性严格被证明。在使用导通政策Q学习方法时,还分析了通过在策略的Q学习方法中添加探测噪声引起的基于Q函数的Bellman方程的偏差。然后,引入行为控制策略,然后提出脱策Q学习算法。同时,研究了算法的收敛性,并且在向系统增加探测噪声时,对最佳控制问题的解决方案的偏差。第三,三个神经网络在演员 - 评论家框架中由交错的Q学习方法运行。因此,导出了一种新的截止次禁止交织Q学习算法,并且已证明其收敛性。给出了仿真结果验证了所提出的方法的有效性。

著录项

  • 来源
  • 作者单位

    Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Liaoning Shihua Univ Sch Informat & Control Engn Fushun 113001 Peoples R China;

    Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Northeastern Univ Int Joint Res Lab Integrated Automat Shenyang 110819 Liaoning Peoples R China;

    Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Northeastern Univ Int Joint Res Lab Integrated Automat Shenyang 110819 Liaoning Peoples R China|Univ Texas Arlington UTA Res Inst Arlington TX 76118 USA;

    Univ Manchester Sch Elect & Elect Engn Manchester M13 9PL Lancs England;

    Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Northeastern Univ Int Joint Res Lab Integrated Automat Shenyang 110819 Liaoning Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Affine nonlinear systems; interleaved learning; off-policy learning; optimal control; Q-learning;

    机译:仿射非线性系统;交错学习;休息室学习;最优控制;Q学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号