WhiteBox Adversarial Algorithms

Adversarial.FGSM — Function

FGSM(model, loss, x, y; ϵ = 0.1, clamp_range = (0, 1))

Fast Gradient Sign Method (FGSM) is a method of creating adversarial examples by pushing the input in the direction of the gradient and bounded by the ε parameter.

This method was proposed by Goodfellow et al. 2014 (https://arxiv.org/abs/1412.6572)

Arguments:

model: The model to base the attack upon.
loss: The loss function to use. This assumes that the loss function includes the predict function, i.e. loss(x, y) = crossentropy(model(x), y).
x: The input to be perturbed by the FGSM algorithm.
y: The 'true' label of the input.
ϵ: The amount of perturbation to apply.
clamp_range: Tuple consisting of the lower and upper values to clamp the input.

source

Adversarial.PGD — Function

PGD(model, loss, x, y; ϵ = 10, step_size = 0.1, iters = 100, clamp_range = (0, 1))

Projected Gradient Descent (PGD) is an itrative variant of FGSM with a random point. For every step the FGSM algorithm moves the input in the direction of the gradient bounded in the l∞ norm. (https://arxiv.org/pdf/1706.06083.pdf)

Arguments:

model: The model to base teh attack upon.
loss: the loss function to use, assuming that it includes the prediction function i.e. loss(x, y) = crossentropy(m(x), y)
x: The input to be perturbed.
y: the ground truth for x.
ϵ: The bound around x.
step_size: The ϵ value in the FGSM step.
iters: The maximum number of iterations to run the algorithm for.
clamp_range: The lower and upper values to clamp the input to.

source

Adversarial.JSMA — Function

JSMA(model, x, t; Υ, θ)

Jacobian Saliency Map Algorithm (JSMA), craft adversarial examples by modifying a very small amount of pixels. These pixels are selected via the jacobian matrix of the output w.r.t. the input of the network. (https://arxiv.org/pdf/1511.07528.pdf)

Arguments:

model: The model to create adversarial examples for.
x: The original input data
t: Index corrosponding to the target class (this is a targeted attack).
Υ: The maximum amount of distortion
θ: The amount by which each feature is perturbed.

source

Adversarial.CW — Function

CW(model, x, t; dist = euclidean, c = 0.1)

Carlini & Wagner's (CW) method for generating adversarials through the optimisation of a loss function against a target class. Here we consider the F6 variant loss function. (https://arxiv.org/pdf/1608.04644.pdf)

Arguments:

model: The model to attack.
x: The original input data
t: Index label corrosponding to the target class.
dist: The distance measure to use L0, L2, L∞. Assumes this is from the Distances.jl library or some other callable function.
c: value for the contribution of the missclassification in the error function.

source

Adversarial.DeepFool — Function

DeepFool(model, x, overshoot = 0.02, max_iter = 50)

Moosavi-Dezfooli et al.'s (https://arxiv.org/pdf/1511.04599.pdf) DeepFool method.

An algorithm to determine the minimum perturbation needed to change the class assignment of the image. This algorithm is useful then for computing a robustness metric of classifiers, where as other algorithms (such as FGSM) may return sub-optimal solutions for generating an adversarial.

The algorithm operates in a greedy way, such that, its not guaranteed to converge to the smallest possible perturbation (that results in an adversarial). Despite this shortcoming, it can often yield a class approximation.

The python/matlab implementations mentioned in the paper can be found at: https://github.com/LTS4/DeepFool/

Arguments:

model: The flux model to attack before the softmax function.
image: An array of input images to create adversarial examples for. (size, WHC)
overshoot: The halting criteria to prevent vanishing gradient.
max_iter: The maximum iterations for the algorithm.

source