A Simple Explanation About SAD DNS and Why It Is a Disaster (or a Blessing)

Sergio De Los Santos    23 November, 2020
A Simple Explanation About SAD DNS and Why It Is a Disaster (or a Blessing)

In 2008, Kaminsky shook the foundations of the Internet. A design flaw in the DNS made it possible to fake responses and send a victim wherever the attacker wanted. 12 years later, a similar and very interesting formula has been found to poison the cache of DNS servers. Even worse than Kaminsky’s. Fundamentally because it does not need to be in the same network as the victim and because it has been announced when many servers, operating systems and programs are still not patched. Let’s see how it works in an understandable way.

In order to fake a DNS request and return a lie to the client, the attacker must know the TxID (transaction ID) and the UDP source port. This implies a 32-bit entropy (guess two 16-bit fields). SAD DNS consists (basically, because the paper is very complex) in inferring the UDP port through an ingenious method that uses ICMP error return messages. If the port is inferred, this again leaves an entropy of only the 16-bit TxID, assumable for an attack. Once you have these two data, you build the packet and bombard the name server.

How to Infer the Open UDP Port

The necessary preliminaries are that, due to the operation of UDP, the server opens some UDP response ports through which it communicates with other name servers. Knowing these ephemeral ports that open in its communications is vital because, together with the TxID, they mean everything that an attacker needs to fake the response. In other words, if a server (resolver or forwarder) asks a question to another server, it expects a specific TxID and UDP in its response. And whoever returns a packet with that data it will be taken as the absolute truth. He could be fooled by a false IP-domain resolution. It is only necessary that the attacker knows in this case the open UDP port, deduct the TxID by brute force and bomb his victim.

When you contact a UDP port and ask if it is open or not, the servers return a “port closed” message in ICMP. To avoid overloading them with answers, they have an overall limit of 1000 per second. A global limit means that it doesn’t matter if you are asked 100 or 10 servers at a time, for all of them you have 1000 answers in one second to open port questions, for example. This, which was done in order to avoid overloading the system, is what actually causes the whole problem.

The overall limit is 1000 on Linux, 200 on Windows and FreeBSD and 250 on MacOS. And in reality, the whole paper is based on “reporting” this fixed global limit formula. It needs to be revised because the dangers of this have been warned about before, but never with such a practical attack and application. Also, important because not only DNS, but QUIC and HTTP/3, based on UDP, can be vulnerable. The attack is complex and at each step there are details to mitigate, but fundamentally the basic steps are (with potential inaccuracies for the sake of simplicity) the following:

  • Send 1000 UDP probes to the victim resolver with faked source IPs testing 1000 ports. This is actually a batch of 50 every 20 ms to overcome another limit of responses per IP that the Linux operating system has.
  • If all 1000 ports are closed, the victim will return (to the faked IPs) 1000 ICMP error packets indicating that the port is not open. If it is open, nothing happens, it is discarded by the corresponding application on that port. It doesn’t matter that the attacker doesn’t see the ICMP responses (they reach the faked IPs). What matters is to see how much of the global limit of 1000 responses per second is “used up” on that batch.
  • Before letting that second pass, the attacker queries any UDP port that he knows is closed and if the server returns an ICMP of “closed port”… it is that he had not used up the 1000 ICPM of “closed port error” and therefore… in that range of 1000 there was at least one open port! Bingo. As the ICMP response limit is global, a single closed port response means that the limit of 1000 “closed port” responses per second was not used up. Some of them were open. This query is made from your real IP, not faked, to receive the real (or not) response.

Thus, in batches of 1000 queries per second and by checking whether or not the limit of error packets port closed is used up, the attacker can deduce which ports are open. In a short time, he will have mapped the open ports of the server. Obviously, the attacker combines this with binary “intelligent” searches to optimize, dividing the ranges of “potentially open” ports in each batch to go faster and therefore find the specific data.

The researchers also had to eliminate the “noise” from other open ports or scans being made to the system while the attacker is performing it, and in the paper, they explain some formulas to achieve this.

More Failures and Problems

It all comes from a perfect storm of failures in UDP implementation, in the implementation of the 1000 response limit… The above explanation is “simplistic” because the researchers detected other implementation problems that sometimes even benefited them, and some other times consisted of slight variations of the attack.

Because the failure is not only in the implementation of the ICMP global limit. Neither is the implementation of UDP getting away with it itself. According to the RFC, on a single UDP socket, applications can receive connections from different source IPs on the same socket. The verifications on who is given what, is left in the RFC to the application handling the incoming connection. This, which is supposed to apply to servers (they are receiving sockets), also applies to clients. Thus, according to the experiments in the paper it also applies to the UDP client that opens ports for queries and therefore makes the attack much easier, allowing “open” ports to be scanned for queries with any source IP address.

And something very important: what happens if in the UDP implementation the application marks a response UDP port as “private” so that only the initiator of the connection can connect to it (and others cannot see whether it is open or closed)? This would pose a problem for the first step of the attack in which the source IPs are faked and speed up the process. Opening “public” or “private” ports depends on the DNS server. And only BIND does this well. Dnsmask, Unbound, no. In these cases you cannot forge the IPs of the spurts (the ones used to use up the global limit and that you don’t care whether you receive or not) but you can only forge the spurts with a single source IP. This would make the scan slower. But no problem. If it’s not like that and the ports are private, there is also a failure in Linux to “mitigate” it. The “global limit” verification is done before the limit count by IP. This, which at the beginning was done this way because checking the global limit is faster than checking the limit per IP, it actually allows it not to take so long and the technique remains valid even with the private ports.

The paper continues with recommendations for forwards, resolvers… a thorough review of DNS security.

Solutions?

Linux already has a patch ready, but there’s a lot of cutting to do. From DNSSEC, which is always recommended but never quite takes off, to disabling ICMP responses… which can be complex. The Kernel patch will now make it so that there is no fixed value of 1000 responses per second, but randomly between 500 and 2000. The attacker will therefore not be able to make his calculations correctly to know if in one second this limit has been used up and deduct open UDP ports.

It seems that the absolute origin of the problem is implementation, not design. This RFC describes that response rate limit, and leaves it open to a number. Choosing it fixed and in 1000 as was done in the Kernel in 2014 is part of the problem.

By the way, with this BlueCatLabs script programmed every minute, you will be able to mitigate the problem on a DNS server by doing by hand what the SAD DNS patch will do.

So, let’s wait for patches for everyone: the operating systems and the main DNS servers. Many public servers are already patched but many more are not. This attack is particularly interesting as it is very clean for the attacker, he does not need to be in the victim’s network. He can do everything from the outside and confuse the servers. A disaster. Or a blessing. As for this, quite a few “loose ends” in the UDP and DNS protocol will be fixed.

Leave a Reply

Your email address will not be published. Required fields are marked *