WhiteBox Adversarial Algorithms
Adversarial.FGSM — FunctionFGSM(model, loss, x, y; ϵ = 0.1, clamp_range = (0, 1))Fast Gradient Sign Method (FGSM) is a method of creating adversarial examples by pushing the input in the direction of the gradient and bounded by the ε parameter.
This method was proposed by Goodfellow et al. 2014 (https://arxiv.org/abs/1412.6572)
Arguments:
model: The model to base the attack upon.loss: The loss function to use. This assumes that the loss function includes the predict function, i.e. loss(x, y) = crossentropy(model(x), y).x: The input to be perturbed by the FGSM algorithm.y: The 'true' label of the input.ϵ: The amount of perturbation to apply.clamp_range: Tuple consisting of the lower and upper values to clamp the input.
Adversarial.PGD — FunctionPGD(model, loss, x, y; ϵ = 10, step_size = 0.1, iters = 100, clamp_range = (0, 1))Projected Gradient Descent (PGD) is an itrative variant of FGSM with a random point. For every step the FGSM algorithm moves the input in the direction of the gradient bounded in the l∞ norm. (https://arxiv.org/pdf/1706.06083.pdf)
Arguments:
model: The model to base teh attack upon.loss: the loss function to use, assuming that it includes the prediction function i.e. loss(x, y) = crossentropy(m(x), y)x: The input to be perturbed.y: the ground truth for x.ϵ: The bound around x.step_size: The ϵ value in the FGSM step.iters: The maximum number of iterations to run the algorithm for.clamp_range: The lower and upper values to clamp the input to.
Adversarial.JSMA — FunctionJSMA(model, x, t; Υ, θ)Jacobian Saliency Map Algorithm (JSMA), craft adversarial examples by modifying a very small amount of pixels. These pixels are selected via the jacobian matrix of the output w.r.t. the input of the network. (https://arxiv.org/pdf/1511.07528.pdf)
Arguments:
model: The model to create adversarial examples for.x: The original input datat: Index corrosponding to the target class (this is a targeted attack).Υ: The maximum amount of distortionθ: The amount by which each feature is perturbed.
Adversarial.CW — FunctionCW(model, x, t; dist = euclidean, c = 0.1)Carlini & Wagner's (CW) method for generating adversarials through the optimisation of a loss function against a target class. Here we consider the F6 variant loss function. (https://arxiv.org/pdf/1608.04644.pdf)
Arguments:
model: The model to attack.x: The original input datat: Index label corrosponding to the target class.dist: The distance measure to use L0, L2, L∞. Assumes this is from the Distances.jl library or some other callable function.c: value for the contribution of the missclassification in the error function.
Adversarial.DeepFool — FunctionDeepFool(model, x, overshoot = 0.02, max_iter = 50)Moosavi-Dezfooli et al.'s (https://arxiv.org/pdf/1511.04599.pdf) DeepFool method.
An algorithm to determine the minimum perturbation needed to change the class assignment of the image. This algorithm is useful then for computing a robustness metric of classifiers, where as other algorithms (such as FGSM) may return sub-optimal solutions for generating an adversarial.
The algorithm operates in a greedy way, such that, its not guaranteed to converge to the smallest possible perturbation (that results in an adversarial). Despite this shortcoming, it can often yield a class approximation.
The python/matlab implementations mentioned in the paper can be found at: https://github.com/LTS4/DeepFool/
Arguments:
model: The flux model to attack before the softmax function.image: An array of input images to create adversarial examples for. (size, WHC)overshoot: The halting criteria to prevent vanishing gradient.max_iter: The maximum iterations for the algorithm.