One of the great achievements of deep learning is image classification using convolutional neural networks. In the article “The Internet of Health” we find a clear example where this technology, like Google’s GoogleLeNet project (which was originally designed to interpret images for intelligent cars or self-driving cars), is now used in the field of medical image analysis for the detection of melanoma and skin cancer.
Just by searching the mobile app shops for this purpose, we found some apps that, based on a photo of a spot or mole on your skin, predict whether it is a malicious melanoma or something completely benign. As we have seen in previous articles, these types of algorithms could be vulnerable to alterations in their behaviour. From the selection of some of these applications, we proceeded to perform a blackbox attack with the aim of strategically generating noise to an image of a melanoma to see if it is possible to invert the classification of the internal neural networks of the applications about which we had no information. That is, in this research scenario, we did not have access to the internal neural networks of the applications.
Given this situation, one of the possible paths was to recreate our own trained models in the most intuitive way to address this type of problem, and we generated attacks for these that, due to the property called transferability, should work in all the applications we had selected. But we found an even simpler way: to save us the step of training a neural network dedicated to melanoma detection in images, we simply looked for an open source project that addressed this problem and had a neural network already trained and ready on Github.
The transferability property was discovered by researchers who found that adversarial samples specifically designed to cause misclassification in one model can also cause misclassification in other independently trained models, even when the two models are supported by distinctly different algorithms or infrastructures
To try to verify the theory using one of the selected apps in a “normal” way from our device or emulator (Android), we proceeded to load our randomly selected melanoma images from Google in order to see their results. Indeed, we could observe that the apps classified those images with a high confidence as melanomas , as we can see in the following image:
From there, we proceeded to recreate an adversarial attack. We assumed that all the victim applications used an approach similar to the one proposed in the Github repository. Therefore, using the neural network weights provided by the repository, we applied the Fast Sign Gradient Method (FSGM) technique, which we mentioned in another another post, generating the “white noise” needed to fool the neural networks. This noise, almost imperceptible to the human eye, is specifically designed from the weights of the neural network to have the greatest impact when assigning the classification probabilities of the images and completely change the prediction verdict.
And indeed, the image carefully generated by means of the weights of the open-source neural networks with the FSGM have the desired impact on the target victim applications. We observe that the transferability property is clearly fulfilled, as we have no idea what internal structure and weights the internal networks of the applications have. However, we were able to change the prediction of images in which a fairly certain result was shown to be melanomas, simply by adding “noise” to them.
We successfully recreated this type of attack on several apps we found in the Google and Apple shops. In some cases, they behaved in a similar way and not exactly the same, but at the end of the tests we always got the same result. Tricking the neural network in its prediction.
In the following image we show the results of the same melanoma image uploaded to the same application, but to which we increased the noise until we reach the point where the application’s internal network changes its prediction.