Step 1: Understanding adversarial examples

Adversarial examples are specialized inputs that have been perturbed with the purpose of misleading a machine learning model. These perturbations are usually so small that they are mostly imperceptible for humans. A notable example is reported in the image below, where adding a small noise vector to an image of a panda leads the network to misclassify the animal as “gibbon”.

Adversarial examples have been studied extensively, especially in the context of deep learning models and often in image classification tasks. However, it is important to note that it is a widespread phenomenon that can interest a variety of models, like linear models and SVMs, and any type of input data, including text and audio.

Last updated