首页> 外文会议>International Conference on Intelligent Transportation Systems >Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems
【24h】

Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems

机译:用于自动驾驶系统安全性的强大深度强化学习

获取原文

摘要

The dependence of autonomous vehicles (AVs) on sensors and communication links exposes them to cyber-physical (CP) attacks by adversaries that seek to take control of the AVs by manipulating their data. In this paper, the state estimation process for monitoring AV dynamics, in presence of CP attacks, is analyzed and a novel adversarial deep reinforcement learning (RL) algorithm is proposed to maximize the robustness of AV dynamics control to CP attacks. The attacker's action and the AV's reaction to CP attacks are studied in a game-theoretic framework. In the formulated game, the attacker seeks to inject faulty data to AV sensor readings so as to manipulate the inter-vehicle optimal safe spacing and potentially increase the risk of AV accidents or reduce the vehicle flow on the roads. Meanwhile, the AV, acting as a defender, seeks to minimize the deviations of spacing so as to ensure robustness to the attacker's actions. Since the AV has no information about the attacker's action and due to the infinite possibilities for data value manipulations, each player uses long short term memory (LSTM) blocks to learn the expected spacing deviation resulting from its own action and feeds this deviation to a reinforcement learning (RL) algorithm. Then, the attacker's RL algorithm chooses the action which maximizes the spacing deviation, while the AV's RL algorithm seeks to find the optimal action that minimizes such deviation. Simulation results show that the proposed adversarial deep RL algorithm can improve the robustness of the AV dynamics control as it minimizes the intra-AV spacing deviation.
机译:自主车辆(AV)对传感器和通信链路的依赖性使他们容易受到对手的网络物理(CP)攻击,这些对手试图通过操纵其数据来控制AV。本文分析了在CP攻击下用于监视AV动态的状态估计过程,并提出了一种新的对抗性深度强化学习(RL)算法,以最大化AV动态控制对CP攻击的鲁棒性。在博弈论的框架内研究攻击者的行动和AV对CP攻击的反应。在制定的游戏中,攻击者试图将错误的数据注入到AV传感器读数中,以操纵车辆之间的最佳安全间距,并可能增加AV事故的风险或减少道路上的车辆流量。同时,AV充当防御者,力求最小化间距偏差,以确保对攻击者行为的鲁棒性。由于AV没有有关攻击者动作的信息,并且由于数据值操纵的可能性无限,因此每个玩家都使用长短期记忆(LSTM)块来学习由其自身动作导致的预期间距偏差,并将此偏差反馈给增援部队学习(RL)算法。然后,攻击者的RL算法选择使间距偏差最大的动作,而AV的RL算法寻求找到使这种偏差最小的最佳动作。仿真结果表明,所提出的对抗式深度RL算法可以最大程度地减小AV内间距偏差,从而提高AV动力学控制的鲁棒性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号