Roadmaps for risk mitigation
  • Risk mitigation roadmaps
  • Mitigation Roadmaps
    • Improving generalization through model validation
      • Step 1: Estimating generalization
      • Step 2: Model validation for hyperparameters tuning
      • Step 3: Performing algorithmic selection
      • Additional Material
    • Hyperparameter Optimisation
      • Step 1: Validation
      • Step 2: Hyperparameter Search
      • Additional Considerations
    • Handling dataset shift
      • Step 1: Understanding dataset shifts
      • Step 2: Detecting dataset shifts
      • Step 3: Handling dataset shifts
      • Additional Material
    • Adversarial training for robustness
      • Step 1: Understanding adversarial examples
      • Step 2: Finding adversarial examples
      • Step 3: Defending against adversarial examples
      • Additional Material
    • Data Minimization techniques
      • Step 1: Understanding the data minimization principle
      • Step 2: Data minimization techniques for Supervised Learning
        • Option 1: Reducing features
        • Option 2: Reducing data points
      • Step 3: Other privacy-preserving techniques
      • Additional Material
    • Measuring Bias and Discrimination
      • Step 1: Understanding bias
      • Step 2A: Measuring Bias for Classification tasks
        • Equality of Outcome metrics
        • Equality of Opportunity metrics
      • Step 2B: Measuring Bias in Regression tasks
        • Equality of Outcome metrics
        • Equality of Opportunity metrics
      • Additional Material
    • Mitigating Bias and Discrimination
      • Step 1: Understanding bias
      • Step 2: Mitigating Bias
        • Option 1: Pre-processing
        • Option 2: In-processing
        • Option 3: Post-Processing
      • Additional Material
    • Documentation for improved explainability of Machine Learning models
      • Step 1: Datasheets for Datasets
      • Step 2: Model Cards for Model Reporting
      • Additional Material
    • Extracting Explanations from Machine Learning Models
      • Step 1: Understanding algorithmic explainability
      • Step 2: In-processing methodologies for Explainability
      • Step 3: Post-processing methodologies for Explainability
      • Additional Material
Powered by GitBook
On this page
  1. Mitigation Roadmaps
  2. Data Minimization techniques

Step 3: Other privacy-preserving techniques

PreviousOption 2: Reducing data pointsNextAdditional Material

Last updated 3 years ago

Many privacy attacks rely on the fact that many models overfit to the training data, and leverage this property to extract personal information. It is therefore important to ensure the models do not overfit. Besides many of the techniques for , you can refer to the roadmap for.

Another technique for privacy protection is to perturb the values of the data points.

  • Noise injection. One simple way of achieving this is by injecting noise into the data. Notice that this needs to be done in a way that preserves some of the statistical properties of the data. This should ensure that, while the predictive accuracy on the individual data is diminished, the performance on the dataset as a whole should be maintained. If X is the original data point, ϵ\epsilonϵ is the noise and Z is the transformed data point, we can add noise using these modalities:

    1. Additive noise: Z=X+ϵZ = X + \epsilonZ=X+ϵ

    2. Multiplicative noise: Z=XϵZ=X \epsilonZ=Xϵ

    3. Logarithmic multiplicative noise: Z=ln(X)+ϵZ=ln(X)+\epsilonZ=ln(X)+ϵ

  • Differential Privacy. A technique that perturbs data in such a way that it makes it impossible to tell whether any individual's data was part of the original dataset by just looking at the output. These techniques are founded on rigorous mathematical definitions of privacy ().

  • Anonymization or pseudo-anonymization. Some of the techniques listed, like dimensionality reduction or noise perturbation, may produce anonymization or pseudo-anonymization results as a side effect. Please note that these are privacy-preserving techniques in itself ( , ) which can be very useful to mitigate risks concerning data privacy

You can find more information about data perturbation in , and about data obfuscation in .

Another technique for privacy protection consists of changing where the data is held and accessed.

  • Federated learning. This type of machine learning does not rely on a centralized approach, but rather allows training to happen across multiple decentralized devices relying only on their own local data. For example, the autocorrect feature of a smartphone can be trained on an individual device using solely the text messages sent by that specific user. This would not only allow for a more personalized outcome, but would also preserve the privacy of the user as those messages would not need to be shared with a central system.

  • Making inferences locally. Another way to mitigate privacy risks, would be to intervene at inference time. If we could host the machine learning model on an individual device, then inference could be triggered locally. This would avoid any superfluous data sharing.

reducing features
improving generalization
Dwork 2006
Iyengar 2002
Neubauer and Heurix 2011
Mivule 2013
Zhang et al. 2018