Carrying out an adversarial attack on a classifier of a neural network is described. A data set of input-output pairs is constructed with each input element of the input-output pairs selected at random from a search space, each output element of the input-output pairs indicating a predictive output of the neural network classifier for the corresponding input element. A Gaussian process is used on the data set of input-output pairs to optimize a detection function to find a best disturbance input element from the data set. The best disturbance input element is upsampled to generate an upsampled best input element. The up-sampled best input element is added to an original input to generate a candidate input. The neural network classifier is queried to determine a classifier prediction for the candidate input. A score for the classifier prediction is calculated. The candidate input is accepted as a successful adversary attack in response to the classifier prediction being incorrect.
展开▼