TextGuise: Adaptive adversarial example attacks on text classification model

Chang Guoqin; Gao Haichang; Yao ZhouXiong Haoquan

摘要

Adversarial examples greatly compromise the security of deep learning models. The key to improving the robustness of a natural language processing (NLP) model is to study attacks and defenses involving adver-sarial text. However, the current adversarial attack methods still face problems, such as the low success rates of attacks on some datasets, and the existing defense methods can already successfully defend against some attack methods. As a result, such attacks are unable to dig deeper into the flaws of NLP mod-els to inform further defense improvements. Hence, it is necessary to design an adversarial attack method with a wider attack range and stronger performance. Aiming at the advantages and disadvantages of existing methods, this paper proposes a new adaptive black-box text adversarial example generation scheme, TextGuise. First, we design a keyword selection method in which word scores are calculated by combining context semantics to select the appropriate keywords to modify. Second, to maintain semantics, new keyword substitution rules are designed in combination with the characteristics of text and popular text expressions. Finally, the best modification strategy is adaptively selected through a querying model to reduce the magnitudes of disturbances. TextGuise can automatically select replace-ment keywords and replacement strategies that efficiently generate adversarial examples with good readability for various text classification tasks. Attack experiments conducted with TextGuise on 5 data -sets yield high attack success rates that can surpass 80% when the perturbation ratio does not exceed 0.2. In addition, we present and discuss experiments focusing on defense, text similarity, query times, time consumption, etc., to test the attack performance of TextGuise. The results show that our attack method can achieve a good balance among various metrics.(c) 2023 Elsevier B.V. All rights reserved.

TextGuise: Adaptive adversarial example attacks on text classification model

摘要

著录项

相关主题

期刊订阅