Deep neural networks (DNNs) play a key role in many applications.Unsurprisingly, they also became a potential attack target of adversaries. Somestudies have demonstrated DNN classifiers can be fooled by the adversarialexample, which is crafted via introducing some perturbations into an originalsample. Accordingly, some powerful defense techniques were proposed againstadversarial examples. However, existing defense techniques require modifyingthe target model or depend on the prior knowledge of attack techniques todifferent degrees. In this paper, we propose a straightforward method fordetecting adversarial image examples. It doesn't require any prior knowledge ofattack techniques and can be directly deployed into unmodified off-the-shelfDNN models. Specifically, we consider the perturbation to images as a kind ofnoise and introduce two classical image processing techniques, scalarquantization and smoothing spatial filter, to reduce its effect. The imagetwo-dimensional entropy is employed as a metric to implement an adaptive noisereduction for different kinds of images. As a result, the adversarial examplecan be effectively detected by comparing the classification results of a givensample and its denoised version. Thousands of adversarial examples against somestate-of-the-art DNN models are used to evaluate the proposed method, which arecrafted with different attack techniques. The experiment shows that ourdetection method can achieve an overall recall of 93.73% and an overallprecision of 95.47% without referring to any prior knowledge of attacktechniques.
展开▼