Roadmaps for risk mitigation
  • Risk mitigation roadmaps
  • Mitigation Roadmaps
    • Improving generalization through model validation
      • Step 1: Estimating generalization
      • Step 2: Model validation for hyperparameters tuning
      • Step 3: Performing algorithmic selection
      • Additional Material
    • Hyperparameter Optimisation
      • Step 1: Validation
      • Step 2: Hyperparameter Search
      • Additional Considerations
    • Handling dataset shift
      • Step 1: Understanding dataset shifts
      • Step 2: Detecting dataset shifts
      • Step 3: Handling dataset shifts
      • Additional Material
    • Adversarial training for robustness
      • Step 1: Understanding adversarial examples
      • Step 2: Finding adversarial examples
      • Step 3: Defending against adversarial examples
      • Additional Material
    • Data Minimization techniques
      • Step 1: Understanding the data minimization principle
      • Step 2: Data minimization techniques for Supervised Learning
        • Option 1: Reducing features
        • Option 2: Reducing data points
      • Step 3: Other privacy-preserving techniques
      • Additional Material
    • Measuring Bias and Discrimination
      • Step 1: Understanding bias
      • Step 2A: Measuring Bias for Classification tasks
        • Equality of Outcome metrics
        • Equality of Opportunity metrics
      • Step 2B: Measuring Bias in Regression tasks
        • Equality of Outcome metrics
        • Equality of Opportunity metrics
      • Additional Material
    • Mitigating Bias and Discrimination
      • Step 1: Understanding bias
      • Step 2: Mitigating Bias
        • Option 1: Pre-processing
        • Option 2: In-processing
        • Option 3: Post-Processing
      • Additional Material
    • Documentation for improved explainability of Machine Learning models
      • Step 1: Datasheets for Datasets
      • Step 2: Model Cards for Model Reporting
      • Additional Material
    • Extracting Explanations from Machine Learning Models
      • Step 1: Understanding algorithmic explainability
      • Step 2: In-processing methodologies for Explainability
      • Step 3: Post-processing methodologies for Explainability
      • Additional Material
Powered by GitBook
On this page
  1. Mitigation Roadmaps
  2. Adversarial training for robustness

Step 3: Defending against adversarial examples

PreviousStep 2: Finding adversarial examplesNextAdditional Material

Last updated 3 years ago

Even though there are a variety of ways to defend against adversarial examples (e.g. Gradient Masking, Adversarial Example Detection), we will focus on adversarial training here, which is generally accepted as the most common and effective method to increase adversarial robustness.

The idea of adversarial training is intuitive: we can include adversarial examples in the training data. Adversarially trained models will be more robust adversarial examples. We have seen that, mathematically, adversarial training can be considered a min-max problem: the attacker is searching for worst case examples to then train the model, which is trying to increase accuracy and robustness. According to this framework, adversarial attack and defense are actually part of the same unified process.

There are different types of adversarial training, however, in the current guide, we will only cover adversarial training for the two algorithms we are analysing: Fast Gradient Sign Method (FGSM) and Projected Gradient Descent(PGD).

  • FGSM: It is sufficient to train the algorithm using the adversarial loss as a regularizer in the objective function. The new modified loss will therefore be:

Loss~(F(x;θ),y))=α⋅Loss(F(x;θ),y))+(1−α)⋅Loss(F(x+ϵsign(∇xLoss(F(x;θ),y);θ),y)).\tilde{Loss}(F(x;\theta),y))=\alpha \cdot Loss(F(x;\theta),y))+(1-\alpha)\cdot Loss(F(x+\epsilon sign( \nabla_xLoss(F(x;\theta), y);\theta),y)).Loss~(F(x;θ),y))=α⋅Loss(F(x;θ),y))+(1−α)⋅Loss(F(x+ϵsign(∇x​Loss(F(x;θ),y);θ),y)).
  • PGD: At every step of the training, the current batch is perturbed according to the iterative formula and then fed to the network:

xt+1=clip(xt−ϵ,xt+ϵ)(xt+α⋅sign(∇xLoss(F(x;θ),y)))x_{t+1}=clip_{(x_t-\epsilon, x_t+\epsilon)} (x_t+\alpha \cdot sign(\nabla_x Loss(F(x;\theta),y)))xt+1​=clip(xt​−ϵ,xt​+ϵ)​(xt​+α⋅sign(∇x​Loss(F(x;θ),y)))

PGD is a critical benchmark and is regarded as the standard way to do adversarial training in practice.

You can find a practical example of how to apply PGD using the ART library in our notebook, which can be visualized here, or downloaded as the following file:

70KB
adversarial-robustness-notebook.ipynb