首页> 外文期刊>IEEE transactions on control systems technology: A publication of the IEEE Control Systems Society >Reinforcement Learning Versus PDE Backstepping and PI Control for Congested Freeway Traffic
【24h】

Reinforcement Learning Versus PDE Backstepping and PI Control for Congested Freeway Traffic

机译:Reinforcement Learning Versus PDE Backstepping and PI Control for Congested Freeway Traffic

获取原文
获取原文并翻译 | 示例
           

摘要

We develop reinforcement learning (RL) boundary controllers to mitigate stop-and-go traffic congestion on a freeway segment. The traffic dynamics of the freeway segment are governed by a macroscopic Aw–Rascle–Zhang (ARZ) model, consisting of 2 $times $ 2 quasi-linear partial differential equations (PDEs) for traffic density and velocity. The boundary stabilization of the linearized ARZ PDE model has been solved by PDE backstepping, guaranteeing spatial $L^{2}$ norm regulation of the traffic state to uniform density and velocity and ensuring that traffic oscillations are suppressed. Collocated proportional (P) and proportional–integral (PI) controllers also provide stability guarantees for allowable control gains and are always applicable as model-free control options through gain tuning by trial and error, or by model-free optimization. Although these approaches are mathematically elegant, the stabilization result only holds locally and is usually affected by the change of model parameters. Therefore, we reformulate the PDE boundary control problem as an RL problem that pursues stabilization without knowing the system dynamics, simply by observing the state values. The proximal policy optimization (PPO), a neural network-based policy gradient algorithm, is employed to obtain RL controllers by interacting with a numerical simulator of the ARZ PDE. Being stabilization-inspired, the RL state-feedback boundary controllers are compared and evaluated against the rigorously stabilizing controllers in two cases: 1) in a system with perfect knowledge of the traffic flow dynamics and then 2) in one with only partial knowledge. We obtain RL controllers that nearly recover the performance of the backstepping, P, and PI controllers with perfect knowledge and outperform them in some cases with partial knowledge. It must be noted, however, that the RL controllers are obtained by conducting about one thousand episodes of iterative training on a simulation model. This training cannot be performed in a collision-free fashion in real traffic, nor convergence guaranteed when training. Thus, we demonstrate that the RL approach has learning (i.e., adaptation) potential for traffic PDE systems under uncertain and changing conditions, but RL is neither simple nor a fully safe substitute for model-based control in real traffic systems.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号