Machine learning is intrinsically reliant on the use of data, which in some cases can be personal or sensitive. As machine learning increases in popularity, it is therefore critical to learn how to protect privacy and the use of data in these contexts. Article 5(1)(c) of the General Data Protection Regulation (GDPR) states that “Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed”. We refer to this concept as the data minimization principle.
Since data is essential for training good machine learning algorithms, it is therefore natural to ask how we can find a way to retain only the sensitive/personal data needed for the task at hand, while preserving performance as much as possible.
This guide will help you identify some strategies for data minimization. Firstly, we will cover some of the common attacks to ML models and why it is important to protect the data. Secondly, we will suggest some data minimization techniques. Finally, we will mention a few alternative techniques for preserving privacy.