首页> 外文会议>International Conference on Systems, Man, and Cybernetics >High-Level Tracking of Autonomous Underwater Vehicles Based on Pseudo Averaged Q-Learning
【24h】

High-Level Tracking of Autonomous Underwater Vehicles Based on Pseudo Averaged Q-Learning

机译:基于伪平均Q学习的自主水下航行器高层跟踪

获取原文

摘要

In this paper, we investigate the trajectory tracking problem of underactuated autonomous underwater vehicles (AUVs) with input saturation. Our proposed model-free algorithm can realize high-level tracking control and stable learning by employing a novel actors-critics architecture, where a critic and multiple actors are learned to estimate the action-value function and deterministic policy, respectively. For the critic, Pseudo Averaged Q-learning, which is a simple extension to Q-learning, is proposed to calculate the target value, specifically, the action-value of next state is obtained by maximizing the average over the last multiple previous learned action-value estimates among all actors. As for the actors, deterministic policy gradient is applied to update the weights. The effectiveness and performance of the proposed Pseudo Averaged Q-learning based deterministic policy gradient (PAQ-DPG) algorithm is verified by implementation to an underactuated AUV. And the results demonstrate high-level tracking control accuracy and stability of learning of PAQ-DPG algorithm. Besides, under our proposed actors-critics framework, increasing the number of actors will further improve the performance.
机译:在本文中,我们研究了输入饱和度的欠下自主水下车辆(AUV)的轨迹跟踪问题。我们所提出的模型算法可以通过采用新颖的演员 - 批评者架构来实现高级跟踪控制和稳定学习,其中批评批评者和多个演员分别估算动作值函数和确定性政策。对于评论家,伪平均Q-Learning,这是一个简单的Q-Learning的扩展,是为了计算目标值,具体地,通过最大多个先前学习动作的平均值来获得下一个状态的动作值 - 所有演员之间的估计。至于演员,确定逻辑策略梯度用于更新权重。所提出的伪平均Q学基于Q学习的确定性政策梯度(PAQ-DPG)算法的有效性和性能是通过实现到欠锯AUV的验证。结果表明了PAQ-DPG算法的高电平跟踪控制精度和学习稳定性。此外,在我们提出的演员 - 批评者框架下,增加了演员的数量将进一步提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号