The First Official Vulnerabilities in Machine Learning in General

Franco Piergallini Guida    23 December, 2020
The First Official Vulnerabilities in Machine Learning in General

Today you are nobody on the market if you do not use a Machine Learning system. Whether it is a system of nested “ifs” or a model of real intelligence with an enviable ROC curve, this technique is part of the usual statement in cyber security, and as such, is already incorporated into the industry as another method. We ask ourselves if you are being attacked or suffering from vulnerabilities, one way to measure this is to know if official flaws are already known and what impact they have had.

How do you attack a Machine Learning system? Well, like almost everything in cyber security, by getting to know it in depth. One of the formulas is to “extract” its intelligence model in order to be able to avoid it. If we know how a system classifies, we can send samples so that it classifies them to our liking and go unnoticed at the same time. For example, in 2019 the first vulnerability associated with a Machine Learning system with a particular CVE was registered with NIST. Here, a Machine Learning model for the classification of spam was “imitated” by collecting the scores assigned by the Email Protection system to the various headers. Once the model had been replicated, they extracted the knowledge that would be used to generate a subsequent attack that evade the filters.

Basically, the relevant knowledge that the researchers were able to obtain by replicating the original model was the weights that the system assigned to each term that appeared in the header of an email in order to classify it as “not spam”. From there, they could perform tests by adding these terms to a spam email to “trick” the original model and achieve the objective of being able to classify a spam email as non-spam.

In our post “Thinking about attacks to WAFs based on Machine Learning” we use the same technique to trick models implemented in some WAFs that detect malicious URLs and XSS. We investigate the models to understand or get to know which terms have more weight when classifying a URL as non-malicious and include them in our malicious URL to generate an erroneous classification in the prediction.

The manufacturer’s response to this vulnerability indicates that part of the solution to this problem was the ability of their model to update their scores (weights) in real time. In other words, sacrificing seasonality for adaptability and dynamically retraining their model with new user interactions. Although this is an interesting possibility in this case and one that reinforces the system, this is not always applicable to any model we use, as it could lead to another new attack vector where an opponent could manipulate the model’s decision limits by poisoning it with synthetic inputs that are actually “rubbish”.  Since to have a significant impact the attacker would have to insert a large amount of rubbish, services that are more popular and see a large volume of legitimate traffic are more difficult to poison to have a significant impact on the learning outcome.

A New World of Opportunities

The previous vulnerability was restricted to one product, but we have also seen generic problems in the algorithms. To make an analogy, it would be like finding a flaw in a manufacturer’s implementation, as opposed to a flaw in the design of the protocol (which would force all manufacturers to update). In this sense, perhaps one of the most famous implementations of Machine Learning are those based on the algorithms trained through gradient descent.

Not so long ago they were discovered to be potentially vulnerable to arbitrary misclassification attacks, as explained in this alert from the CERT Coordination Center, Carnegie Mellon University. In this case, we have already studied and shared in a real-world application attacking a Deep Fakes video recognition system, and another related to the attack of mobile applications for the detection of melanomas in the skin that we will publish later on.

In conclusion, as we incorporate this technology into the world of security, we will see more examples of implementation flaws, and even more sporadically design flaws in algorithms that will test the ability of these supposedly intelligent systems. The consequences will generally be that attackers will be able to trick them into misclassifying and therefore probably evading certain security systems.

Leave a Reply

Your email address will not be published. Required fields are marked *