One of the fundamental pieces for the correct implementation of machine and deep learning is data. This type of algorithm needs to consume, in some cases, a large amount of data in order to find a combination of internal “parameters” that allow it to generalise or learn, with a view to predict new entries. If you are familiar with computer security, what you have probably noticed is that data is what is left over, security is about data, and we find it represented in different forms: files, logs, network packets, etc.
Typically, this data is analysed in a manually, for example, using file hashes, custom rules such as signatures and manually defined heuristics. These types of techniques require too much manual work to keep up to date with the changing picture of cyber threats, which has a dramatically exponential daily growth. In 2016, there were around 597 million unique malware executables known to the security community according to AVTEST, and in 2020 we are already over a billion so far.
For this volume of data, a manual analysis of all attacks is humanly impossible. For this reason, deep and machine learning algorithms are widely used in security, for example: anti-virus to detect malware, firewall detecting suspicious activity on the network, SIEMs to identify suspicious trends in data, among others.
Just as a cybercriminal could exploit a vulnerability in a firewall to gain access to a web server, machine learning algorithms are also susceptible to possible attack as we saw in these two previous instalments: Adversarial Attacks: the Enemy of Artificial Intelligence I and Adversarial Attacks: the Enemy of Artificial intelligence (II). Therefore, before putting such solutions in the front line, it is crucial to consider their weaknesses and understand how malleable they are under pressure.
Examples of Attacks on WAF
Let’s have a look at a couple of examples of attacks on two WAFs, where each one fulfils a simple objective: to detect XSS and malicious sites by analysing the text of a specific URL. From large data sets, where XSS and malicious sites were correctly labelled, a logistic regression type algorithm was trained with which to predict whether it is malicious or not.
The data sets for XSS and for malicious sites used to train these two logistic regression algorithms are basically a collection of URLs classified as “good” and “bad”:
Where the data set of malicious sites contains about 420,000 URLs between good and bad. And, on the XSS side, 1,310,000.
As it is a white box type attack, we have access to all the data processing and manipulation for the training of the algorithms. Therefore, we can see that the first step in both scenarios is to apply a technique called TF-IDF (Term frequency – Inverse document frequency), which will give us an importance to each of the terms given their frequency of appearance in each of the URLs in our data sets.
From our TF-IDF object we can obtain the vocabulary generated for both cases, and once the algorithm is trained, we can easily access and see which of these terms gave it more weight. At the same time, from these terms we can easily manipulate the output of the algorithm. Let’s have a look at the case of malicious site rating.
Malicious Site Rating
According to the algorithm, if any of these terms appears in a URL there is a high probability that it is a non-malicious site:
This means that, by simply adding some of these terms to my malicious URL, I will be able to influence the algorithm at my mercy as much as possible. I have my malicious URL that the algorithm detects with enough certainty, which indeed, is a malicious site:
With a 90% confidence, it classifies the URL as malicious. But if we add the term ‘photobucket’ to the URL, the algorithm already classifies it as “good”:
We could even push that probability further by simply adding another term to the URL, for example “2011”:
Let’s move on to the XSS scenario. We have a payload which the algorithm correctly classifies as XSS and with a 99% confidence (in this example label 1 corresponds to XSS and the 0 to non-XSS):
Let’s take a look at the terms with the least weight to reverse that prediction:
As we did before, we added some of these terms to manipulate the output of the algorithm. After some tests, we find the payload that inverts the prediction, we had to add the term “t/s” about 700 times to achieve the objective:
And, indeed, our algorithm predicts it as NO XSS:
Having access to the pre-processing steps data and models facilitates the creation of these types of attacks. If the attacker did not have access to these, it would imply a greater effort to find the right pre-processing of the data and the architecture or algorithm of the predictive model. However, it is still possible to recreate these attacks through other techniques such as transferability, where adverse samples that are specifically designed to cause a misrating in one model can also cause misrating in other independently trained models. Even when the two models are supported by clearly different algorithms or infrastructures.