Recent studies have highlighted the vulnerability and low robustness of deep learning model against adversarial examples. This issue limits their deployability on ubiquitous applications requiring a high level of security such as driverless system, unmanned aerial vehicle and intrusion detection. In this paper, we propose latent encodings transferring attack (LET-attack) to generate target natural adversarial examples to fool well-trained classifiers. In order to perturb in latent space, we train WGAN-variants on various datasets to achieve feature extraction, image reconstruction and image discrimination against counterfeit with good performance. Thanks to our two-stage procedure of mapping transformation, the adversary performs precise and semantic perturbations on source data referring to target data in latent space. By using the critic in WGAN-variant and the well-trained classifier, the adversary crafts more verisimilar and effective adversarial examples. As shown in the experimental results on MNIST, FashionMNIST, CIFAR-10 and LSUN, LET-attack can yield a distinct set of adversarial examples with partly data manifold targeted transfer and attains similar attack performance against state-of-the-art models in different attack scenarios. What is more, we evaluate LET-attack on the characteristic of transferability in different classifiers on MNIST and CIFAR-10 respectively, and find that the adversarial examples are easy to transfer with high confidence.
展开▼