Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

首页> 外文期刊>Neural Networks and Learning Systems, IEEE Transactions on >Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

【24h】

Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

机译：基于多伪Q学习的确定性梯度算法用于水下机器人的跟踪控制

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error-based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses subgreedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based deterministic policy gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. In addition, the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

机译：本文研究了一类动力不足，输入受限的欠驱动自动水下航行器（AUV）的轨迹跟踪问题。与现有的仅采用单个行为者批评者但无法实现令人满意的跟踪控制精度和稳定学习的策略梯度方法不同，我们提出的算法可以通过应用多个行为者-批评者的混合结构来实现AUV的高级别跟踪控制精度和稳定学习。批评家接受了确定性政策和行动价值功能的培训。具体地，对于评论者，使用预期的基于Bellman绝对误差的绝对更新规则来选择在每个时间步中要更新的最差的评论者。随后，为了为所选批评者计算出具有更准确目标值的损失函数，针对连续动作空间开发了伪Q-学习（Pseudo Q-learning），该伪Q-learning使用子贪婪策略代替Q-learning中的贪婪策略，提出MPQ以减少对动作值函数的高估并稳定学习。对于参与者，使用确定性策略梯度来更新权重，并且最终学习的策略定义为所有参与者的平均值，以避免较大但较差的更新。此外，定性地给出了学习的稳定性分析。通过在具有两个不同参考轨迹的AUV上的应用，验证了所提出的基于MPQ的确定性策略梯度（MPQ-DPG）算法的有效性和通用性。此外，结果证明了MPQ-DPG具有高水平的跟踪控制精度和稳定的学习能力。此外，结果还证实，增加演员和评论家的人数将进一步提高表演水平。

著录项

来源
《Neural Networks and Learning Systems, IEEE Transactions on》 |2019年第12期|3534-3546|共13页
作者

展开▼
作者单位

Tsinghua Univ Dept Automat Beijing 100084 Peoples R China;

Univ Macau Fac Sci & Technol Dept Comp & Informat Sci Macau 99999 Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Trajectory tracking; Aerospace electronics; Trajectory; Dynamics; Neural networks; Autonomous underwater vehicles; Heuristic algorithms; Autonomous underwater vehicles; hybrid actors-critics; Multi Pseudo Q-learning (MPQ); reinforcement learning (RL); tracking control;

机译：轨迹跟踪;航空电子弹道;动力学;神经网络;自主水下航行器;启发式算法;自主水下航行器;混合演员评论家;多伪Q学习（MPQ）;强化学习（RL）;追踪控制;

相似文献

外文文献
中文文献
专利

1. Adaptive coordinated tracking control of multiple autonomous underwater vehicles [J] . Xue Qi Ocean Engineering . 2014,第nova15期

机译：多台自动水下航行器的自适应协调跟踪控制
2. A Multi-Autonomous Underwater Vehicle System for Autonomous Tracking of Marine Life [J] . Yukun Lin, Jerry Hsiung, Richard Piersall, Journal of Field Robotics . 2017,第4期

机译：用于海洋生物自主跟踪的多自治水下航行器系统
3. Target tracking control of underactuated autonomous underwater vehicle based on adaptive nonsingular terminal sliding mode control [J] . Jian Cao, Yushan Sun, Guocheng Zhang, International Journal of Advanced Robotic Systems . 2020,第2期

机译：基于自适应非透射终端滑动模式控制的底层自治水下车辆的目标跟踪控制
4. Unmanned Surface Vehicle Course Tracking Control Based on Neural Network and Deep Deterministic Policy Gradient Algorithm [C] . Yan Wang, Jie Tong, Tian-Yu Song, OCEANS - MTS/IEEE Kobe Techno-Oceans . 2018

机译：基于神经网络和深度确定性策略梯度算法的无人机地面航迹跟踪控制
5. Multiple vehicle coordination and cooperative estimation for target tracking with applications to autonomous underwater vehicle systems [D] . Triplett, Benjamin 2008

机译：用于目标跟踪的多车协调和协同估计，并应用于水下自动航行器系统
6. Optimization of the Energy Consumption of Depth Tracking Control Based on Model Predictive Control for Autonomous Underwater Vehicles [O] . Feng Yao, Chao Yang, Mingjun Zhang, 2019

机译：基于模型预测控制的自动水下航行器深度跟踪控制能耗优化
7. Tracking control for an autonomous underwater vehicle based on multiplicative potential energy functionud [O] . Ismail Zool Hilmi, Mokhar B. M., Dunnigan M. W. 2012

机译：基于乘势能函数的水下机器人自动跟踪控制 ud

Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅