首页> 外文会议>International Joint Conference on Neural Networks >On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems
【24h】

On Adversarial Examples and Stealth Attacks in Artificial Intelligence Systems

机译:人工智能系统中的对抗实例和隐身攻击

获取原文

摘要

In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker).We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI’s decision-making space is a major contributor to the AI’s susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI’s feature space to have concentrated probability density functions or (b) the dimensionality of the AI’s decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large.
机译:在这项工作中,我们提供了一个正式的理论框架,用于评估和分析针对通用人工智能(AI)系统的两类恶意行为。我们的结果适用于从输入空间映射到决策空间的通用多类分类器,包括深度学习应用程序中使用的人工神经网络。考虑了两类攻击。第一类涉及对抗性示例,并涉及引入输入数据的小扰动,这些扰动会导致分类错误。第一次在这里介绍的第二类称为“隐身攻击”,它涉及对AI系统本身的微小扰动。在这里,受干扰的系统会在特定的小型数据集(甚至可能是单个输入)上产生攻击者所需的任何输出,但会在验证集上正常运行(攻击者不知道)。在两种情况下,我们都证明了这一点,即在基于对抗性示例的攻击和隐身攻击的情况下,AI决策空间的维度是导致AI易感性的主要因素。对于基于对抗性示例的攻击,第二个关键参数是数据概率分布中不存在局部集中度,此属性称为“拖尾绝对连续性”。根据我们的发现,对付对抗性示例的鲁棒性要么要求(a)AI特征空间中的数据分布具有集中的概率密度函数,要么(b)AI决策变量的维数要足够小。我们还将展示如何在难以发现的高维AI系统上进行隐身攻击,除非验证集成倍增长。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号