Option 1: Reducing features | Roadmaps for risk mitigation

Not all features collected will be useful for the predictive task at hand. Feature selection techniques help reduce the number of variables to feed the machine learning model. There is a very large number of techniques available for features selection, we will cover here some common techniques that can be implemented with the sklearn library:

Variance Threshold: it removes low-variance features from the dataset.
Univariate feature selection: it keeps those features which perform best according to a univariate statistical test.
1. The k best features can be selected according to the score, the false positive rate, the false discovery rate, or the family-wise error rate. We could also select the features scoring above a given percentile.
2. The univariate tests include:
  - Computing mutual information, ANOVA and chi-squared statistics for classification tasks with discrete target variables.
  - Computing mutual information and F-statistic for regression with a continuous target variable.
3. Recursive feature elimination: An external estimator is trained on the dataset and its features are ranked according to their impact on prediction. The least important features are then removed. This operation is iterated until the desired number of features is reached.
4. Model-based feature selection: We could also use any model which assigns a measure of features importance to score features. Examples include Lasso for regression, and Logistic Regression or SVM for classification. Decision trees are also a possible choice.
5. Sequential feature selection: This algorithm either adds (forward selection) or removes (backward selection) one feature at a time by iteratively picking the best one at each step.

Feature extraction techniques perform a dimensionality reduction of the dataset, where the data is mapped from a higher-dimensional space to a lower-dimensional one while retaining meaningful properties. There is a very large number of techniques available for dimensionality reduction, we will cover here some common techniques that can be implemented with the sklearn library:

Principal Component Analysis: This technique projects the data onto a smaller dimensional space, while preserving as much information as possible. This is achieved by identifying the principal components, uncorrelated vectors that represent the directions of the data with the maximum variance, and therefore maximum information. Other variations of PCA include:
1. Sparse PCA: which extends PCA to enforce sparsity in the data by introducing an L1 penalty.
2. Kernel PCA: which extends PCA with the use of kernel methods.
3. Independent Component Analysis: which separates the data into statistically independent non-Gaussian subcomponents.
Feature Agglomeration: This technique iteratively groups features into clusters.
Random Projections: these techniques reduce the dimensionality of the data using a sparse random matrix.

Note that none of these techniques is explicitly aimed at privacy protection. Research in this field is ongoing. Goldsteen et al. 2020 present a technique for dimensionality reduction that explicitly targets privacy.

In the context of deep learning, this is a concept that is highly related to feature extraction, in that a dimensionality reduction of the dataset is performed. This is usually achieved by adding an initial layer to the neural network (the embedding layer), which transforms the data into a lower dimensional vector. This transformation is simply learned as part of the training of the neural network.