Step 2: Finding adversarial examples
Last updated
Last updated
We will now define adversarial examples more rigorously . Let’s take a model F with parameters theta, which has been trained correctly. The model can thus correctly assign the sample x to its label y:
The aim of the attacker will be to find a small noise vector to add to the input x in order to fool the model:
Good adversarial examples are essential to increasing the robustness of the system, as we will further explore in the following session. In order to find adversarial examples, we will have to solve this maximisation problem. This is still an open issue in research, and actively under study. We will focus here on two standard algorithms in the literature: Fast Gradient Sign Method (FGSM) and Projected Gradient Descent(PGD). These methods frame the issue of finding adversarial examples as a min-max game, where:
The goal of the training will be to find the parameters that minimize the loss: .
The goal of the attacker will be to find that maximises the loss:
There are different methods that have been proposed to maximise the loss:
FGSM perturbs each input by a small amount in the direction of the gradient:
PGD has a similar approach, but it takes iterative steps in the direction of the gradient. As such, It finds the noise by repeating:
You can find a practical example of how to apply PGD in our notebook, which can be visualized here, or downloaded as the following file: