【24h】

White-to-Black: Efficient Distillation of Black-Box Adversarial Attacks

机译:白对黑色:高效蒸馏黑匣子对抗攻击

获取原文
获取外文期刊封面目录资料

摘要

Adversarial examples are important for understanding the behavior of neural models, and can improve their robustness through adversarial training. Recent work in natural language processing generated adversarial examples by assuming white-box access to the attacked model, and optimizing the input directly against it (Ebrahimi et al., 2018). In this work, we show that the knowledge implicit in the optimization procedure can be distilled into another more efficient neural network. We train a model to emulate the behavior of a white-box attack and show that it generalizes well across examples. Moreover, it reduces adversarial example generation time by 19x-39x. We also show that our approach transfers to a black-box setting, by attacking The Google Perspective API and exposing its vulnerability. Our attack flips the API-predicted label in 42% of the generated examples, while humans maintain high-accuracy in predicting the gold label.
机译:对抗性实例对于了解神经模型的行为是重要的,并且可以通过对抗性培训来提高他们的鲁棒性。最近在自然语言处理的工作通过假设白盒访问攻击模型,并直接优化输入(Ebrahimi等,2018)。在这项工作中,我们表明可以将所隐含的信息中隐含的知识蒸馏到另一个更高效的神经网络。我们训练一个模型来模拟白盒攻击的行为,并表明它贯穿于示例中的概括。此外,它通过19x-39x减少了对抗性示例生成时间。我们还显示我们的方法通过攻击Google Perspective API并暴露其漏洞来传输到黑匣子设置。我们的攻击将API预测标签翻转在所生成的例子的42%,而人类在预测金标签方面保持高精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号