WhiteBox Adversarial Algorithms

Adversarial.FGSMFunction
FGSM(model, loss, x, y; ϵ = 0.1, clamp_range = (0, 1))

Fast Gradient Sign Method (FGSM) is a method of creating adversarial examples by pushing the input in the direction of the gradient and bounded by the ε parameter.

This method was proposed by Goodfellow et al. 2014 (https://arxiv.org/abs/1412.6572)

Arguments:

  • model: The model to base the attack upon.
  • loss: The loss function to use. This assumes that the loss function includes the predict function, i.e. loss(x, y) = crossentropy(model(x), y).
  • x: The input to be perturbed by the FGSM algorithm.
  • y: The 'true' label of the input.
  • ϵ: The amount of perturbation to apply.
  • clamp_range: Tuple consisting of the lower and upper values to clamp the input.
source
Adversarial.PGDFunction
PGD(model, loss, x, y; ϵ = 10, step_size = 0.1, iters = 100, clamp_range = (0, 1))

Projected Gradient Descent (PGD) is an itrative variant of FGSM with a random point. For every step the FGSM algorithm moves the input in the direction of the gradient bounded in the l∞ norm. (https://arxiv.org/pdf/1706.06083.pdf)

Arguments:

  • model: The model to base teh attack upon.
  • loss: the loss function to use, assuming that it includes the prediction function i.e. loss(x, y) = crossentropy(m(x), y)
  • x: The input to be perturbed.
  • y: the ground truth for x.
  • ϵ: The bound around x.
  • step_size: The ϵ value in the FGSM step.
  • iters: The maximum number of iterations to run the algorithm for.
  • clamp_range: The lower and upper values to clamp the input to.
source
Adversarial.JSMAFunction
JSMA(model, x, t; Υ, θ)

Jacobian Saliency Map Algorithm (JSMA), craft adversarial examples by modifying a very small amount of pixels. These pixels are selected via the jacobian matrix of the output w.r.t. the input of the network. (https://arxiv.org/pdf/1511.07528.pdf)

Arguments:

  • model: The model to create adversarial examples for.
  • x: The original input data
  • t: Index corrosponding to the target class (this is a targeted attack).
  • Υ: The maximum amount of distortion
  • θ: The amount by which each feature is perturbed.
source
Adversarial.CWFunction
CW(model, x, t; dist = euclidean, c = 0.1)

Carlini & Wagner's (CW) method for generating adversarials through the optimisation of a loss function against a target class. Here we consider the F6 variant loss function. (https://arxiv.org/pdf/1608.04644.pdf)

Arguments:

  • model: The model to attack.
  • x: The original input data
  • t: Index label corrosponding to the target class.
  • dist: The distance measure to use L0, L2, L∞. Assumes this is from the Distances.jl library or some other callable function.
  • c: value for the contribution of the missclassification in the error function.
source
Adversarial.DeepFoolFunction
DeepFool(model, x, overshoot = 0.02, max_iter = 50)

Moosavi-Dezfooli et al.'s (https://arxiv.org/pdf/1511.04599.pdf) DeepFool method.

An algorithm to determine the minimum perturbation needed to change the class assignment of the image. This algorithm is useful then for computing a robustness metric of classifiers, where as other algorithms (such as FGSM) may return sub-optimal solutions for generating an adversarial.

The algorithm operates in a greedy way, such that, its not guaranteed to converge to the smallest possible perturbation (that results in an adversarial). Despite this shortcoming, it can often yield a class approximation.

The python/matlab implementations mentioned in the paper can be found at: https://github.com/LTS4/DeepFool/

Arguments:

  • model: The flux model to attack before the softmax function.
  • image: An array of input images to create adversarial examples for. (size, WHC)
  • overshoot: The halting criteria to prevent vanishing gradient.
  • max_iter: The maximum iterations for the algorithm.
source