首页> 外文期刊>Knowledge-Based Systems >Improving adversarial robustness of deep neural networks by using semantic information
【24h】

Improving adversarial robustness of deep neural networks by using semantic information

机译:使用语义信息改善深神经网络的对抗性鲁棒性

获取原文
获取原文并翻译 | 示例

摘要

The vulnerability of deep neural networks (DNNs) to adversarial attack, which is an attack that can mislead state-of-the-art classifiers into making an incorrect classification with high confidence by deliberately perturbing the original inputs, raises concerns about the robustness of DNNs to such attacks. Adversarial training, which is the main heuristic method for improving adversarial robustness and the first line of defense against adversarial attacks, requires many sample-by-sample calculations to increase training size and is usually insufficiently strong for an entire network. This paper provides a new perspective on the issue of adversarial robustness, one that shifts the focus from the network as a whole to the critical part of the region close to the decision boundary corresponding to a given class. From this perspective, we propose a method to generate a single but image-agnostic adversarial perturbation that carries the semantic information implying the directions to the fragile parts on the decision boundary and causes inputs to be misclassified as a specified target. We call the adversarial training based on such perturbations "region adversarial training" (RAT), which resembles classical adversarial training but is distinguished in that it reinforces the semantic information missing in the relevant regions. Experimental results on the MNIST and CIFAR-10 datasets show that this approach greatly improves adversarial robustness even when a very small dataset from the training data is used; moreover, it can defend against fast gradient sign method, universal perturbation, projected gradient descent, and Carlini and Wagner adversarial attacks, which have a completely different pattern from those encountered by the model during retraining. (C) 2021 Elsevier B.V. All rights reserved.
机译:深度神经网络(DNN)对抗对抗攻击的脆弱性,这是一种误导最先进的分类器,通过故意扰乱原始输入,对具有高信任进行高度信心来制造错误的分类,提高了对DNN的鲁棒性的担忧这种攻击。对抗性培训是改善对抗性鲁棒性的主要启发式方法和对抗对抗攻击的第一道防线,需要许多样本计算来增加训练规模,并且通常对整个网络的强大不充分。本文提供了对逆势稳健性问题的新视角,其中将焦点从网络转移到整个区域的关键部分,接近与给定类对应的决策边界。从这个角度来看,我们提出了一种方法来生成单个但图像不可原谅的对抗扰动,其携带暗示对决策边界上的脆弱部分的方向的语义信息,并使输入被错误分类为指定的目标。我们称之为基于这种扰动“地区对抗性训练”(大鼠)的对抗培训,其类似于经典的对抗性培训,但是,它的特征在于它强化了相关地区缺失的语义信息。 MNIST和CIFAR-10数据集上的实验结果表明,即使使用训练数据中的非常小的数据集,这种方法也大大提高了对抗性鲁棒性;此外,它可以防御快速梯度标志法,通用扰动,投影梯度下降,以及卡内利和摇摆敌对攻击,其具有从刷新期间由模型遇到的那些完全不同的模式。 (c)2021 elestvier b.v.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号