If we must choose a particularly damaging vulnerability, it would most likely be arbitrary code execution, and even more so if it can be exploited remotely. The consequences may be fatal, as we have seen many times (Conficker for malware analysts, MS08-067 and EternalBlue for pentesters, WannaCry for everyone, etc.).
Arbitrary code execution has been and remains one of the most loss-and-repair programming errors in the history of silicon. By the way, it is called arbitrary because actually the CPU is already executing code. The point of “arbitrary” is that it is left to the attacker to decide what code is executed, since it is the one taking control of the process. That’s what an exploitation of this type is all about: diverting the normal and determined execution of a process to a foreign agent introduced in an arbitrary way by an attacker through an exploit.
How Exactly Does This Happen?
There are many ways to execute code (from here we will understand arbitrary). By the way, the definition is not limited to native executables. Cross-site scripting is an injection of foreign code that, again, diverts the execution of a script to the injected code snippet.
One of the factors in the execution of code at the native level is the one derived from memory management issues. We will review the most common types of errors, focusing on how they occur and how operating systems and programming languages are evolving to mitigate the effect of these failures when they are maliciously exploited.
Going back in time, not all languages had a manual management of the use of the memory. In fact, John McCarthy, one of the fathers of Artificial Intelligence and creator of LISP, coined the concept of automatic garbage collection (memory freed during the execution of a process) in the sixties.
However, even though the garbage collectors made life easier for programmers (detaching themselves from manual management), it was an overload on resource consumption that some systems could not afford. To get an idea, it would be as if the real-time flight tracking of an airport control tower stopped for a few seconds to eliminate the freed memory.
That’s why languages like C or C++ keep a huge weight when programming system applications. They are languages without garbage collector (although it is possible to make use of them through libraries), so the programmer is fully responsible for the management of the memory. And of course, we all know what happens when you leave the work of a machine in the hands of a human. On the contrary, freeing the resources consumed by a collector means an enormous increase in the performance and response of the program − and this is translated into a lower cost in terms of hardware.
Is It So Difficult to Manage the Memory Manually?
Of course, it is a very open question and the answer will depend on our level of familiarity with this type of programming and on the facilities given by the language − added to the use of external tools and technology implemented in the compiler.
Let’s see an example: imagine that we want to associate a text string to a variable. A trivial operation in languages with automatic memory management, for example in Python (the following is an example of code, we are not going to bother with its correction):
mi_cadena = input()
# procesamos mi_cadena
Well, this in C language has some interesting points. First of all, we don’t know the length of the string. That amount does not come “by default” with the string, it must be found or added as a parameter to the function. Secondly, since we do not know its length, we do not know either what memory we are going to need to save it. Thirdly: Who is in charge of warning us when we do not need that memory anymore?
Let’s look at a code snippet (there are multiple ways to implement this, safer and better, but this one will allow us to illustrate what we mean, for example, using strdup, “%ms”, etc.):
As we can see, we haven’t even started to manipulate the string when we already have to write code to detect the end of a string, reserve memory, watch the limits of the array in the stack, etc.
However, the important thing is to look at line 28, that “free” function used to tell the system to free the piece of memory we had reserved in the “read” function. Here the situation is clear: we no longer use that memory, so we return it.
In an example of code, it is easy to make use of the memory but what if we continue to use that reserved memory 200 lines of code later? What if we have to pass that pointer through several functions? How is it clear who is in charge of the memory, the function called or who is calling that function?
In the subsequent blog entries, we will review some scenarios that turn into vulnerabilities because of this type of oversight: double free, use of uninitialized memory, memory leaks and dangling pointers.