SWITCH DNS recursive name service improvements with dnsdist


SWITCH operates recursive name servers for any user within the Swiss NREN. While larger universities typically run their own recursive name server, many smaller organisations rely on our resolvers for domain name resolution. During the consolidation of our name server nodes into two data centres, we looked for opportunities to improve our setup. Dnsdist is a DNS, DoS and abuse-aware load balancer from the makers of PowerDNS and plays a big part in our new setup. While the first stable release of dnsdist (version 1.0.0) is only a few days old (21 April 2016), it feels like everyone is already using it. We are happy users as well and want to share with you some of the features we especially like about dnsdist.

Our old setup consisted of several name server nodes which all shared the same IP address provided by anycast routing. Our recursive name server of choice was and still is BIND, and we have been providing DNSSEC validation and malicious domain lookup protection through our DNSfirewall service for some time. While this setup worked very well, it had the disadvantage that some badly behaved or excessive clients could degrade the performance of a single name server node and as such affect all users routed to this node. Another disadvantage was that each name server node got its share of the whole traffic. While this may sound good, it has the disadvantage that we have several smaller caches, one on each node. My favorite quote from Bert Hubert, founder of PowerDNS, is: “A busy name server is a happy name server“. What it means is that it is actually faster to route all your queries to a single name server node because this will improve the cache-hit rate.

Dnsdist provides a rich set of DNS-specific features
Our new setup still makes use of anycast routing. However, it is now the dnsdist load balancer nodes that announce this IP address, and they forward the queries to the back-end recursive name servers for domain name resolution.

The server nodes are located in two data centres, and both load-balancers announce the same IP address to make use of anycast routing. Query load is typically sent to resolvers within the same data centre but is distributed to the other site as well in the event of a higher load or server loss.

In describing the disadvantages of our old setup, I have already named two of the advantages dnsdist offers in our new setup: DoS and abuse-aware load balancing was the first one. Our query load is typically sent to a single resolver within the same data centre. However, on higher query load, additional resolvers are used – in some cases, with less preference, even resolvers from the other site. We can also more easily react to abusive query load (e.g. clients participating in DDoS attacks). So far, we have not implemented rules to handle such traffic, but it is good to know that dnsdist allows for very flexible traffic shaping.

The second disadvantage of our old setup was the distributed cache. Now, we are trying to build a large cache on a single resolver within each data centre. This results in more cache hits and better latency. We have not measured the difference very accurately, but a quick look at our statistics suggests that we have achieved a significant reduction in the number of upstream queries from the resolver to authoritative name servers, thus improving latency. In fact, a very high percentage of queries can be answered from the cache, which means in less then 1 millisecond as measured by dnsdist.

Response time measured from one of our dnsdist load balancers.
Response time measured from one of our dnsdist load balancers.

In order to test new DNS software or make some experiments with live DNS traffic, we use the TeeAction, which is currently only available in the master branch (and not in release 1.0.0). The TeeAction sends off a copy of a UDP query to another server. In our case, we currently use it to test the current BIND 9.11.0a1 alpha release. Testing new software with live traffic is important to improve the stability. However, testing in the production environment is hardly ever possible or desired. The TeeAction allows us to test new DNS features or new resolver software releases safely without affecting our production environment.

Outlook
Almost every Internet service end users are using starts with a DNS lookup. At SWITCH, we are committed to providing first-class DNS recursive name service for our users. There should be no reason for any of our users to prefer public DNS resolvers such as Google DNS. In fact, we are certain that we can provide better latency and better cache results, and on top of that you get security as well. If that is not convincing enough, we are currently working on improving privacy for end users as well. In short, the privacy issue with DNS traffic is that it is still sent in clear (unencrypted) form and that queries to authoritative-only name servers are typically identical to the query name the user originally requested to resolve (see also DNS Privacy Considerations, RFC 7626). DNS privacy deserves its own topic, however, and so we will leave it for a future blog post.

2 thoughts on “SWITCH DNS recursive name service improvements with dnsdist”

  1. > There should be no reason for any of our users to prefer public DNS resolvers such as Google DNS

    That’s a great goal, but one reason why people use these DNS resolvers is because they manage to remember their IP addresses.

    Maybe in IPv6 you can come up with memorizable addresses (2001:620::8888:15:dead ? :-).

Comments are closed.