Step 1: Understanding adversarial examples

Adversarial examples are specialized inputs that have been perturbed with the purpose of misleading a machine learning model. These perturbations are usually so small that they are mostly imperceptible for humans. A notable example is reported in the image below, where adding a small noise vector to an image of a panda leads the network to misclassify the animal as “gibbon”.

Here, a neural network is able to correctly classify the image of a panda. When some carefully-crafted noise is added to the image, the network erroneously classifies the panda as a gibbon. Note that the noisy image appears almost unchanged to the human eye. [Image adapted from Goodfellow at al. 2015]

Adversarial examples have been studied extensively, especially in the context of deep learning models and often in image classification tasks. However, it is important to note that it is a widespread phenomenon that can interest a variety of models, like linear models and SVMs, and any type of input data, including text and audio.

Last updated