...
首页> 外文期刊>IEEE Transactions on Systems, Man, and Cybernetics >Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles
【24h】

Parameterized Batch Reinforcement Learning for Longitudinal Control of Autonomous Land Vehicles

机译:参数化批量强化学习,用于自主陆地车辆的纵向控制

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a parameterized batch reinforcement learning algorithm for near-optimal longitudinal control of autonomous land vehicles (ALVs). The proposed approach uses an actor-critic architecture, where parameterized feature vectors based on kernels are learned from collected samples for approximating the value functions and policies. One difference between the parameterized batch actor-critic (PBAC) algorithm and previous actor-critic learning approaches is that the critic and actor in PBAC share the same linear features, which has been theoretically proved to be a beneficial property for the convergence of actor-critic learning approaches. In order to obtain better learning efficiency, least-squares-based batch updating rules are designed for the critic and actor, respectively. Based on the PBAC learning algorithm, a data-driven longitudinal control method is presented for ALVs to obtain near-optimal control policies which adaptively tune the fuel/brake control signals to track different speeds. A multiobjective reward function is designed so that both tracking precision and driving smoothness are considered. Extensive experiments were conducted on a real ALV platform while driving on flat, slippery, sloping, and bumpy roads. The experimental results illustrate the superiority of the PBAC-based self-learning controller over conventional longitudinal control methods such as proportional-integral (PI) control and learning-based PI control.
机译:本文提出了一种参数化的批量强化学习算法,用于自主陆地车辆(ALV)的近乎最佳纵向控制。所提出的方法使用行为者批判体系结构,其中从收集的样本中学习基于内核的参数化特征向量,以近似值函数和策略。参数化批处理actor-critic(PBAC)算法与以前的actor-critic学习方法之间的区别是,PBAC中的批评者和actor具有相同的线性特征,这在理论上已被证明是actor-convergence的有益特性。评论家学习方法。为了获得更好的学习效率,分别为评论家和演员设计了基于最小二乘法的批量更新规则。基于PBAC学习算法,提出了一种针对ALV的数据驱动纵向控制方法,以获得接近最优的控制策略,该策略自适应地调整燃油/制动控制信号以跟踪不同的速度。设计了多目标奖励函数,以便同时考虑跟踪精度和行驶平稳性。在平坦,湿滑,倾斜和颠簸的道路上行驶时,在真实的ALV平台上进行了广泛的实验。实验结果表明,基于PBAC的自学习控制器优于常规的纵向控制方法,如比例积分(PI)控制和基于学习的PI控制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号