SWITCH Security-Blog

SWITCH-CERT IT-Security Blog

Optimizing Negative Caching Time in DNS

6 Comments


A recent presentation by SIDN (.nl) at the Spring 2016 DNS-OARC workshop reminded me of the importance of Time-To-Live (TTL) values in TLD zones. Specifically, it got me thinking about lowering the negative caching time in .ch/.li from currently 1 hour to 15 minutes.

What is negative caching?
When a resolver receives a response to a query, it caches it for the duration of the TTL specified by the record. For positive responses, the record contains the TTL, but for negative responses (response code NXDOMAIN), there is no answer to the query question. For this case, the response contains the SOA record of the zone in the authority section. Negative caching is specified in RFC 2308 as the minimum of the SOA record’s TTL and the SOA minimum field. For example, the original SOA record of the .ch zone looked as follows:

dig +nocmd +noall +answer @a.nic.ch ch. soa
ch. 3600 IN SOA a.nic.ch. helpdesk.nic.ch. 2016041421 900 600 1123200 3600

The SOA TTL is 3600, and the SOA minimum time is also set to 3600. The minimum of these two values is of course 3600 too. That means the negative caching time for any .ch domain lookup is one hour.

A lower negative caching time is more user-friendly
People who are about to register a new domain name may also look up the name over DNS. However, this means that they just cached the non-existence of the name in the resolver they are using. A domain can be registered in a matter of minutes, and this can prevent them from using the domain name on their network for the duration of the negative caching time.

Another example is the domain abuse process for .ch/.li. We notify the domain holders of compromised websites that are abused for drive-by infections or host a phishing site. Some domain holders who do not take action to remove the malicious content within the stated deadline get their domain name temporarily suspended, as a result of which the domain name delegation is removed from the .ch/.li zone. A low negative caching time helps restore the domain more quickly when the delegation is put back in the zone.

These are just two examples showing that, put simply, a lower negative caching time is more user-friendly. On the other hand, a lower negative caching time also means a higher query load on the name server.

Negative caching time on other TLDs
I was wondering what negative caching times are chosen by other TLDs, so I tested all TLDs currently delegated by the root zone. The chart below shows the range and percentage of all the negative caching time values observed.

Negative Caching Time in TLDs

Distribution of negative caching time among all TLDs

It surprised me that roughly 40% of all TLDs use a high value of one day (86400 seconds). The old value for .ch/.li of one hour was actually not that bad compared with what a lot of TLDs still use. Luckily for end users, recursive DNS resolvers enforce TTL limits. For the negative caching time, the maximum default values from well known resolvers are as follows:

  • BIND: 10800 (3 hours)
  • Unbound: 3600 (1 hour)
  • PowerDNS: 3600 (1 hour)
  • Windows DNS: 900 (15 minutes)

That means there is little advantage in setting it higher than 3 hours if resolvers are not obeying this value anyway. There was a nice presentation by Microsoft on the subject of “Caching of Negative DNS records” at the Spring 2015 DNS-OARC workshop that looked into this behaviour as well.

Query load impact on our name servers
During the last two weeks, we lowered the negative caching time from its origin of 3600 to 1800 and finally 900 seconds. For comparison, I plotted the number of query responses with the response code NXDOMAIN. As can be seen, for the selected name server the value increased slightly during each week. There is some fluctuation in the plot, the problem is that the measurement period is too short to compensate for other traffic noise.

NXDOMAIN response rate on a selected server during three time periods with different negative caching times.

NXDOMAIN response rate on a selected server during three time periods with different negative caching times.

Apart from slightly more requests for non-existent domain names, I also expected an increase in DS query type requests from validating resolvers. There is an interesting presentation from JPRS from the Spring 2013 DNS-OARC workshop, which studied this subject in more detail. The plot in our case suggests that it is not a big issue for .ch/.li at the moment either (the negative caching 900 plot line behaves unexpected. Again, I attribute this to the too short measurement period).

DS query type request rate on a selected server during three time periods with different negative caching times.

DS query type request rate on a selected server during three time periods with different negative caching times.

The small increase in queries per second is negligible. We observe a much larger increase in query load through the “natural” growth of the zone itself (number of delegations). Each node has spare capacity of many thousands of queries per second, and this change does not cause any impact to worry about. Having said that, it would probably make sense for many other TLDs to lower negative caching time as well.

A “thank you” goes out to the people on the DNS-OARC dns-operation mailing list for their input to my question about negative caching weirdness.

6 thoughts on “Optimizing Negative Caching Time in DNS

  1. Pingback: DNS Zone File Time Value Recommendations | SWITCH Security-Blog

  2. Hi

    Thanks for the article.

    Despite mentioning that you did “thinking about lowering the negative caching time” and that you also tested the query load for two weeks, I miss the conclusion and the decision if you now lower it to 15min. Do you use it now for production?

    Regards
    Amon

    • Hello Amon

      Yes, we lowered it to 15min. You can test it yourself: dig ch. soa:

      ;; ANSWER SECTION:
      ch. 900 IN SOA a.nic.ch. helpdesk.nic.ch. (
      2016050610 ; serial
      900 ; refresh (15 minutes)
      600 ; retry (10 minutes)
      1123200 ; expire (1 week 6 days)
      900 ; minimum (15 minutes)
      )

      Daniel

  3. Thanks for the answer. Is this still for testing for the next few weeks or is that already your final decision for a longer period?

    • It’s final decision. There is no point in going back to 3600sec for us 😉

  4. Thank you Daniel (and the Team at TLD), you’re always on a continous improvement ! I feel proud of our TLD in Switzerland.
    Jorge