Thinking About Attacks on WAFs Based on Machine Learning

Franco Piergallini Guida    15 October, 2020
Thinking About Attacks on WAFs Based on Machine Learning

One of the fundamental pieces for the correct implementation of machine and deep learning is data. This type of algorithm needs to consume, in some cases, a large amount of data in order to find a combination of internal “parameters” that allow it to generalise or learn, with a view to predict new entries. If you are familiar with computer security, what you have probably noticed is that data is what is left over, security is about data, and we find it represented in different forms: files, logs, network packets, etc.

Typically, this data is analysed in a manually, for example, using file hashes, custom rules such as signatures and manually defined heuristics. These types of techniques require too much manual work to keep up to date with the changing picture of cyber threats, which has a dramatically exponential daily growth. In 2016, there were around 597 million unique malware executables known to the security community according to AVTEST, and in 2020 we are already over a billion so far.

For this volume of data, a manual analysis of all attacks is humanly impossible. For this reason, deep and machine learning algorithms are widely used in security, for example: anti-virus to detect malware, firewall detecting suspicious activity on the network, SIEMs to identify suspicious trends in data, among others.

Just as a cybercriminal could exploit a vulnerability in a firewall to gain access to a web server, machine learning algorithms are also susceptible to possible attack as we saw in these two previous instalments: Adversarial Attacks: the Enemy of Artificial Intelligence I and Adversarial Attacks: the Enemy of Artificial intelligence (II). Therefore, before putting such solutions in the front line, it is crucial to consider their weaknesses and understand how malleable they are under pressure.

Examples of Attacks on WAF

Let’s have a look at a couple of examples of attacks on two WAFs, where each one fulfils a simple objective: to detect XSS and malicious sites by analysing the text of a specific URL. From large data sets, where XSS and malicious sites were correctly labelled, a logistic regression type algorithm was trained with which to predict whether it is malicious or not.

The data sets for XSS and for malicious sites used to train these two logistic regression algorithms are basically a collection of URLs classified as “good” and “bad”:

Picture 2: Malicious URLs
Picture 2: Malicious URLs
Picture 3: XSS

Where the data set of malicious sites contains about 420,000 URLs between good and bad. And, on the XSS side, 1,310,000.

As it is a white box type attack, we have access to all the data processing and manipulation for the training of the algorithms. Therefore, we can see that the first step in both scenarios is to apply a technique called TF-IDF (Term frequency – Inverse document frequency), which will give us an importance to each of the terms given their frequency of appearance in each of the URLs in our data sets.

From our TF-IDF object we can obtain the vocabulary generated for both cases, and once the algorithm is trained, we can easily access and see which of these terms gave it more weight. At the same time, from these terms we can easily manipulate the output of the algorithm. Let’s have a look at the case of malicious site rating.

Malicious Site Rating

According to the algorithm, if any of these terms appears in a URL there is a high probability that it is a non-malicious site:

Picture 4: weight of terms to be considered NOT malicious
Picture 4: weight of terms to be considered NOT malicious

This means that, by simply adding some of these terms to my malicious URL, I will be able to influence the algorithm at my mercy as much as possible. I have my malicious URL that the algorithm detects with enough certainty, which indeed, is a malicious site:

Picture 5: malicious URL
Picture 5: malicious URL

With a 90% confidence, it classifies the URL as malicious. But if we add the term ‘photobucket’ to the URL, the algorithm already classifies it as “good”:

Picture 6: Malicious URL with a trustworthy term
Picture 6: Malicious URL with a trustworthy term

We could even push that probability further by simply adding another term to the URL, for example “2011”:

Picture 7: Malicious URL with two trustworthy terms
Picture 7: Malicious URL with two trustworthy terms

Let’s move on to the XSS scenario. We have a payload which the algorithm correctly classifies as XSS and with a 99% confidence (in this example label 1 corresponds to XSS and the 0 to non-XSS):

Picture 8: Payload of detectable XSS
Picture 8: Payload of detectable XSS

Let’s take a look at the terms with the least weight to reverse that prediction:

Picture 9: weight of the terms to lower the prediction of XSS attack
Picture 9: weight of the terms to lower the prediction of XSS attack

As we did before, we added some of these terms to manipulate the output of the algorithm. After some tests, we find the payload that inverts the prediction, we had to add the term “t/s” about 700 times to achieve the objective:

Picture 10: payload capable of reversing the XSS prediction
Picture 10: payload capable of reversing the XSS prediction

And, indeed, our algorithm predicts it as NO XSS:

Picture 11: No detection of XSS by the payload used
Picture 11: No detection of XSS by the payload used

In case anyone is interested in the subject, we leave some links to the WAF Malicious Sites and the WAF of XSS projects. Some references were taken from the Malware Data Science book.

Having access to the pre-processing steps data and models facilitates the creation of these types of attacks. If the attacker did not have access to these, it would imply a greater effort to find the right pre-processing of the data and the architecture or algorithm of the predictive model. However, it is still possible to recreate these attacks through other techniques such as transferability, where adverse samples that are specifically designed to cause a misrating in one model can also cause misrating in other independently trained models. Even when the two models are supported by clearly different algorithms or infrastructures.

Leave a Reply

Your email address will not be published. Required fields are marked *