Off-Policy Interleaved <inline-formula> <tex-math notation='LaTeX'>$Q$ </tex-math></inline-formula>-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

Li Jinna; Chai Tianyou; Lewis Frank L.; Ding Zhengtao; Jiang Yi

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Off-Policy Interleaved

$Q$

-Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

【24h】

Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

机译：off-policy交错<内联 - 公式> $ q $ -learning：仿射非线性离散时间系统的最佳控制

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.

机译：在本文中，提出了一种新的截止型Q学习算法，用于解决沿系统轨迹的测量数据的仿射非线性离散时间（DT）系统的最佳控制问题。仿射系统，未知动态和脱策学习方法的非线性特征在近似最佳控制器上提出了巨大挑战。为此，首先对仿射非线性DT系统进行了最佳控制的政策Q学习方法，并且其收敛性严格被证明。在使用导通政策Q学习方法时，还分析了通过在策略的Q学习方法中添加探测噪声引起的基于Q函数的Bellman方程的偏差。然后，引入行为控制策略，然后提出脱策Q学习算法。同时，研究了算法的收敛性，并且在向系统增加探测噪声时，对最佳控制问题的解决方案的偏差。第三，三个神经网络在演员 - 评论家框架中由交错的Q学习方法运行。因此，导出了一种新的截止次禁止交织Q学习算法，并且已证明其收敛性。给出了仿真结果验证了所提出的方法的有效性。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2019年第5期|1308-1320|共13页
作者
Li Jinna; Chai Tianyou; Lewis Frank L.; Ding Zhengtao; Jiang Yi;
展开▼
作者单位

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Liaoning Shihua Univ Sch Informat & Control Engn Fushun 113001 Peoples R China;

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Northeastern Univ Int Joint Res Lab Integrated Automat Shenyang 110819 Liaoning Peoples R China;

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Northeastern Univ Int Joint Res Lab Integrated Automat Shenyang 110819 Liaoning Peoples R China|Univ Texas Arlington UTA Res Inst Arlington TX 76118 USA;

Univ Manchester Sch Elect & Elect Engn Manchester M13 9PL Lancs England;

Northeastern Univ State Key Lab Synthet Automat Proc Ind Shenyang 110819 Liaoning Peoples R China|Northeastern Univ Int Joint Res Lab Integrated Automat Shenyang 110819 Liaoning Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Affine nonlinear systems; interleaved learning; off-policy learning; optimal control; Q-learning;

机译：仿射非线性系统;交错学习;休息室学习;最优控制;Q学习;

相似文献

外文文献
中文文献
专利

1. Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems [J] . Li Jinna, Chai Tianyou, Lewis Frank L., Neural Networks and Learning Systems, IEEE Transactions on . 2019,第5期

机译：非策略交错的 $ Q $ -学习：仿射非线性离散时间系统的最优控制
2. A Highly Birefringent and Nonlinear AsSe$_2$–As$_2$S$_5$ [J] . Tianyu Yang, Can Ding, Y. Jay Guo Photonics Journal, IEEE . 2019,第1期

机译：高度双折射且非线性的AsSe $ _ 2 $ -As $ _ 2 $ S $ _ 5 $ < /一世
3. Reliability and Reliability Importance of Weighted-$r$-Within-Consecutive-$k$-out-of-$n: F$System [J] . Kirtee K. Kamalja, Kalpesh P. Amrutkar IEEE Transactions on Reliability . 2018,第3期

机译：加权- $ r $ -连续内- $ k $ -out-of- $ n：F $ 系统
4. Deep Reinforcement Learning Based Finite-Horizon Optimal Control for a Discrete-Time Affine Nonlinear System [C] . Jong Woo Kim, Byung Jun Park, Haeun Yoo, Annual Conference of the Society of Instrument and Control Engineers of Japan . 2018

机译：离散仿射非线性系统基于深度强化学习的有限视野最优控制
5. Optimal tracking control of uncertain systems: On-policy and off-policy reinforcement learning approaches [D] . Modares, Hamidreza 2015

机译：不确定系统的最优跟踪控制：基于策略和基于策略的强化学习方法
6. Fuzzy ... formula ... output-feedback control for the discrete-time system with channel fadings sector nonlinearities and randomly occurring interval delays and nonlinearities [O] . Xiaozheng Fan, Yan Wang, Manfeng Hu -1

机译：具有信道衰落扇区非线性以及随机出现的间隔延迟和非线性的离散时间系统的模糊...公式输出反馈控制
7. A Highly Birefringent and Nonlinear AsSe $_2$ –As $_2$ S $_5$ Photonic Crystal Fiber With Two Zero-Dispersion Wavelengths [O] . Tianyu Yang, Can Ding, Y. Jay Guo 2019

机译：一个高度双折射和非线性ASSE $ _ 2 $ -as <内联公式> $ _ 2 $ s $ _ 5 $ photonic具有两个零色散波长的晶体纤维

Off-Policy Interleaved $Q$ -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems

摘要

著录项

相似文献

相关主题

期刊订阅