首页> 外文期刊>Engineering Applications of Artificial Intelligence >TextTricker: Loss-based and gradient-based adversarial attacks on text classification models
【24h】

TextTricker: Loss-based and gradient-based adversarial attacks on text classification models

机译:TextTricker:文本分类模型的基于损失和基于梯度的对抗攻击

获取原文
获取原文并翻译 | 示例

摘要

Adversarial examples are generated by adding infinitesimal perturbations to legitimate inputs so that incorrect predictions can be induced into deep learning models. They have received increasing attention recently due to their significant values in evaluating and improving the robustness of neural networks. While adversarial attack algorithms have achieved notable advancements in the continuous data of images, they cannot be directly applied for discrete symbols such as text, where all the semantic and syntactic constraints in languages are expected to be satisfied. In this paper, we propose a white-box adversarial attack algorithm, TextTricker, which supports both targeted and non-targeted attacks on text classification models. Our algorithm can be implemented in either a loss-based way, where word perturbations are performed according to the change in loss, or a gradient-based way, where the expected gradients are computed in the continuous embedding space to restrict the perturbations towards a certain direction. We perform extensive experiments on two publicly available datasets and three state-of-the-art text classification models to evaluate our algorithm. The empirical results demonstrate that TextTricker performs notably better than baselines in attack success rate. Moreover, we discuss various aspects of TextTricker in details to provide a deep investigation and offer suggestions for its practical use.
机译:通过向合法输入添加无限扰动来生成对抗性示例,从而可以引起不正确的预测来诱导深度学习模型。由于它们在评估和提高神经网络的稳健性方面,它们最近受到了越来越多的关注。虽然对逆势攻击算法在图像的连续数据中实现了显着的进步,但不能直接应用于文本等离散符号,其中预期满足语言的所有语义和语法约束。在本文中,我们提出了一个白盒普通攻击算法,TextTricker,它支持关于文本分类模型的目标和非目标攻击。我们的算法可以以基于损耗的方式实现,其中根据丢失的变化或基于梯度的方式执行Word扰动,其中在连续的嵌入空间中计算预期梯度以限制朝向某一的扰动方向。我们在两个公共可用数据集和三种最先进的文本分类模型上进行广泛的实验,以评估我们的算法。经验结果表明,TextTricker在攻击成功率中表现出比基线的显着优于基线。此外,我们详细讨论了TextTricker的各个方面,以便为其实际使用提供深入的调查和提出建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号