Roadmaps for risk mitigation
  • Risk mitigation roadmaps
  • Mitigation Roadmaps
    • Improving generalization through model validation
      • Step 1: Estimating generalization
      • Step 2: Model validation for hyperparameters tuning
      • Step 3: Performing algorithmic selection
      • Additional Material
    • Hyperparameter Optimisation
      • Step 1: Validation
      • Step 2: Hyperparameter Search
      • Additional Considerations
    • Handling dataset shift
      • Step 1: Understanding dataset shifts
      • Step 2: Detecting dataset shifts
      • Step 3: Handling dataset shifts
      • Additional Material
    • Adversarial training for robustness
      • Step 1: Understanding adversarial examples
      • Step 2: Finding adversarial examples
      • Step 3: Defending against adversarial examples
      • Additional Material
    • Data Minimization techniques
      • Step 1: Understanding the data minimization principle
      • Step 2: Data minimization techniques for Supervised Learning
        • Option 1: Reducing features
        • Option 2: Reducing data points
      • Step 3: Other privacy-preserving techniques
      • Additional Material
    • Measuring Bias and Discrimination
      • Step 1: Understanding bias
      • Step 2A: Measuring Bias for Classification tasks
        • Equality of Outcome metrics
        • Equality of Opportunity metrics
      • Step 2B: Measuring Bias in Regression tasks
        • Equality of Outcome metrics
        • Equality of Opportunity metrics
      • Additional Material
    • Mitigating Bias and Discrimination
      • Step 1: Understanding bias
      • Step 2: Mitigating Bias
        • Option 1: Pre-processing
        • Option 2: In-processing
        • Option 3: Post-Processing
      • Additional Material
    • Documentation for improved explainability of Machine Learning models
      • Step 1: Datasheets for Datasets
      • Step 2: Model Cards for Model Reporting
      • Additional Material
    • Extracting Explanations from Machine Learning Models
      • Step 1: Understanding algorithmic explainability
      • Step 2: In-processing methodologies for Explainability
      • Step 3: Post-processing methodologies for Explainability
      • Additional Material
Powered by GitBook
On this page
  1. Mitigation Roadmaps
  2. Adversarial training for robustness

Step 2: Finding adversarial examples

PreviousStep 1: Understanding adversarial examplesNextStep 3: Defending against adversarial examples

Last updated 3 years ago

We will now define adversarial examples more rigorously . Let’s take a model F with parameters theta, which has been trained correctly. The model can thus correctly assign the sample x to its label y:

Fθ(x)=y.F_{\theta}(x)=y.Fθ​(x)=y.

The aim of the attacker will be to find a small noise vector δ\delta δ to add to the input x in order to fool the model:

δ s.t. Fθ(x+δ)≠y, where ∣∣δ∣∣∞≤ϵ.\delta \text{ s.t. } F_{\theta}(x +\delta)\neq y, \text{ where } ||\delta||_\infty \leq \epsilon.δ s.t. Fθ​(x+δ)=y, where ∣∣δ∣∣∞​≤ϵ.

Good adversarial examples are essential to increasing the robustness of the system, as we will further explore in the following session. In order to find adversarial examples, we will have to solve this maximisation problem. This is still an open issue in research, and . We will focus here on two standard algorithms in the literature: and . These methods frame the issue of finding adversarial examples as a , where:

  • The goal of the training will be to find the parameters θ\theta θ that minimize the loss: min⁡θLoss(F(x;θ),y)\min_\theta Loss(F(x;\theta),y)minθ​Loss(F(x;θ),y).

  • The goal of the attacker will be to find that maximises the loss:max⁡δLoss(F(x+δ;θ),y) with ∣∣δ∣∣∞≤ϵ.\max_{\delta} Loss(F(x+\delta;\theta),y) \text{ with }||\delta||_{\infty}\leq\epsilon.maxδ​Loss(F(x+δ;θ),y) with ∣∣δ∣∣∞​≤ϵ.

There are different methods that have been proposed to maximise the loss:

  • FGSM perturbs each input by a small amount in the direction of the gradient: δ=ϵ⋅sign(∇xLoss(F(x;θ),y)).\delta=\epsilon \cdot sign(\nabla_x Loss(F(x;\theta), y)).δ=ϵ⋅sign(∇x​Loss(F(x;θ),y)).

  • PGD has a similar approach, but it takes iterative steps in the direction of the gradient. As such, It finds the noise δ\deltaδ by repeating:δt+1=clip(−ϵ,ϵ)(δt+α⋅sign(∇xLoss(F(x;θ),y))).\delta_{t+1}=clip_{(-\epsilon, \epsilon)}(\delta_t+\alpha \cdot sign(\nabla_xLoss(F(x;\theta),y))).δt+1​=clip(−ϵ,ϵ)​(δt​+α⋅sign(∇x​Loss(F(x;θ),y))).

You can find a practical example of how to apply PGD in our notebook, which can be visualized , or downloaded as the following file:

actively under study
Fast Gradient Sign Method (FGSM)
Projected Gradient Descent(PGD)
min-max game
here
70KB
adversarial-robustness-notebook.ipynb