Step 3: Defending against adversarial examples

Even though there are a variety of ways to defend against adversarial examples (e.g. Gradient Masking, Adversarial Example Detection), we will focus on adversarial training here, which is generally accepted as the most common and effective method to increase adversarial robustness.

The idea of adversarial training is intuitive: we can include adversarial examples in the training data. Adversarially trained models will be more robust adversarial examples. We have seen that, mathematically, adversarial training can be considered a min-max problem: the attacker is searching for worst case examples to then train the model, which is trying to increase accuracy and robustness. According to this framework, adversarial attack and defense are actually part of the same unified process.

There are different types of adversarial training, however, in the current guide, we will only cover adversarial training for the two algorithms we are analysing: Fast Gradient Sign Method (FGSM) and Projected Gradient Descent(PGD).

  • FGSM: It is sufficient to train the algorithm using the adversarial loss as a regularizer in the objective function. The new modified loss will therefore be:

Loss~(F(x;θ),y))=αLoss(F(x;θ),y))+(1α)Loss(F(x+ϵsign(xLoss(F(x;θ),y);θ),y)).\tilde{Loss}(F(x;\theta),y))=\alpha \cdot Loss(F(x;\theta),y))+(1-\alpha)\cdot Loss(F(x+\epsilon sign( \nabla_xLoss(F(x;\theta), y);\theta),y)).
  • PGD: At every step of the training, the current batch is perturbed according to the iterative formula and then fed to the network:

xt+1=clip(xtϵ,xt+ϵ)(xt+αsign(xLoss(F(x;θ),y)))x_{t+1}=clip_{(x_t-\epsilon, x_t+\epsilon)} (x_t+\alpha \cdot sign(\nabla_x Loss(F(x;\theta),y)))

PGD is a critical benchmark and is regarded as the standard way to do adversarial training in practice.

You can find a practical example of how to apply PGD using the ART library in our notebook, which can be visualized here, or downloaded as the following file:

Last updated