Step 2: Finding adversarial examples

We will now define adversarial examples more rigorously . Let’s take a model F with parameters theta, which has been trained correctly. The model can thus correctly assign the sample x to its label y:

F_{\theta}(x)=y.

The aim of the attacker will be to find a small noise vector $\delta$ to add to the input x in order to fool the model:

\delta \text{ s.t. } F_{\theta}(x +\delta)\neq y, \text{ where } ||\delta||_\infty \leq \epsilon.

Good adversarial examples are essential to increasing the robustness of the system, as we will further explore in the following session. In order to find adversarial examples, we will have to solve this maximisation problem. This is still an open issue in research, and actively under study. We will focus here on two standard algorithms in the literature: Fast Gradient Sign Method (FGSM) and Projected Gradient Descent(PGD). These methods frame the issue of finding adversarial examples as a min-max game, where:

The goal of the training will be to find the parameters $\theta$ that minimize the loss: $\min_\theta Loss(F(x;\theta),y)$ .
The goal of the attacker will be to find that maximises the loss: $\max_{\delta} Loss(F(x+\delta;\theta),y) \text{ with }||\delta||_{\infty}\leq\epsilon.$

There are different methods that have been proposed to maximise the loss:

FGSM perturbs each input by a small amount in the direction of the gradient: $\delta=\epsilon \cdot sign(\nabla_x Loss(F(x;\theta), y)).$
PGD has a similar approach, but it takes iterative steps in the direction of the gradient. As such, It finds the noise $\delta$ by repeating: $\delta_{t+1}=clip_{(-\epsilon, \epsilon)}(\delta_t+\alpha \cdot sign(\nabla_xLoss(F(x;\theta),y))).$

You can find a practical example of how to apply PGD in our notebook, which can be visualized here, or downloaded as the following file:

70KB

adversarial-robustness-notebook.ipynb

PreviousStep 1: Understanding adversarial examples NextStep 3: Defending against adversarial examples

Last updated 2 years ago