WhiteBox Adversarial Algorithms
Adversarial.FGSM
— FunctionFGSM(model, loss, x, y; ϵ = 0.1, clamp_range = (0, 1))
Fast Gradient Sign Method (FGSM) is a method of creating adversarial examples by pushing the input in the direction of the gradient and bounded by the ε parameter.
This method was proposed by Goodfellow et al. 2014 (https://arxiv.org/abs/1412.6572)
Arguments:
model
: The model to base the attack upon.loss
: The loss function to use. This assumes that the loss function includes the predict function, i.e. loss(x, y) = crossentropy(model(x), y).x
: The input to be perturbed by the FGSM algorithm.y
: The 'true' label of the input.ϵ
: The amount of perturbation to apply.clamp_range
: Tuple consisting of the lower and upper values to clamp the input.
Adversarial.PGD
— FunctionPGD(model, loss, x, y; ϵ = 10, step_size = 0.1, iters = 100, clamp_range = (0, 1))
Projected Gradient Descent (PGD) is an itrative variant of FGSM with a random point. For every step the FGSM algorithm moves the input in the direction of the gradient bounded in the l∞ norm. (https://arxiv.org/pdf/1706.06083.pdf)
Arguments:
model
: The model to base teh attack upon.loss
: the loss function to use, assuming that it includes the prediction function i.e. loss(x, y) = crossentropy(m(x), y)x
: The input to be perturbed.y
: the ground truth for x.ϵ
: The bound around x.step_size
: The ϵ value in the FGSM step.iters
: The maximum number of iterations to run the algorithm for.clamp_range
: The lower and upper values to clamp the input to.
Adversarial.JSMA
— FunctionJSMA(model, x, t; Υ, θ)
Jacobian Saliency Map Algorithm (JSMA), craft adversarial examples by modifying a very small amount of pixels. These pixels are selected via the jacobian matrix of the output w.r.t. the input of the network. (https://arxiv.org/pdf/1511.07528.pdf)
Arguments:
model
: The model to create adversarial examples for.x
: The original input datat
: Index corrosponding to the target class (this is a targeted attack).Υ
: The maximum amount of distortionθ
: The amount by which each feature is perturbed.
Adversarial.CW
— FunctionCW(model, x, t; dist = euclidean, c = 0.1)
Carlini & Wagner's (CW) method for generating adversarials through the optimisation of a loss function against a target class. Here we consider the F6 variant loss function. (https://arxiv.org/pdf/1608.04644.pdf)
Arguments:
model
: The model to attack.x
: The original input datat
: Index label corrosponding to the target class.dist
: The distance measure to use L0, L2, L∞. Assumes this is from the Distances.jl library or some other callable function.c
: value for the contribution of the missclassification in the error function.
Adversarial.DeepFool
— FunctionDeepFool(model, x, overshoot = 0.02, max_iter = 50)
Moosavi-Dezfooli et al.'s (https://arxiv.org/pdf/1511.04599.pdf) DeepFool method.
An algorithm to determine the minimum perturbation needed to change the class assignment of the image. This algorithm is useful then for computing a robustness metric of classifiers, where as other algorithms (such as FGSM) may return sub-optimal solutions for generating an adversarial.
The algorithm operates in a greedy way, such that, its not guaranteed to converge to the smallest possible perturbation (that results in an adversarial). Despite this shortcoming, it can often yield a class approximation.
The python/matlab implementations mentioned in the paper can be found at: https://github.com/LTS4/DeepFool/
Arguments:
model
: The flux model to attack before the softmax function.image
: An array of input images to create adversarial examples for. (size, WHC)overshoot
: The halting criteria to prevent vanishing gradient.max_iter
: The maximum iterations for the algorithm.