Since the beginning of July 2018 SWITCH publishes the top 1000 domain names of the Swiss TLD .ch. On the one hand we want to support open data in Switzerland and on the other hand we are keen on knowing how exactly domain names are being used in order to keep the .ch zone secure. Therefore we have decided to start publishing the top 1000 domain names relying on the information that can be extracted from the authoritative DNS traffic. Although there are already sources that publish a ranking of websites, such as the Top Alexa, the DNS based ranking will give a differing view on the .ch zone since not only the www usage is represented but all services that make use of DNS.
To be able to calculate this ranking we are collecting all DNS queries that go to a.nic.ch and b.nic.ch using Entrada, a system developed by SIDN to conveniently store and analyze massive amounts of DNS data. Briefly summarized, Entrada converts pcap files that contain DNS traffic to parquet files which are subsequently stored in a Hadoop cluster. Queries on this transformed data can be quickly done using the SQL-like engine Impala. In the past we have already published a blog post where basic DNS traffic behavior is investigated. Although the collected data is just a fraction of the authoritative .ch traffic it represents the overall traffic fairly well since most resolvers send queries to all authoritative name servers.
For the newly published list of top 1000 domain names we have chosen the following way of measuring the importance of a domain. Note that the idea is based on a tool developed by Alexander Mayrhofer from nic.at called DNS-Magnitude. A first approach that comes to mind would consist of counting the number of queries for each domain name and then taking the 1000 domain names with the most queries. However, considering the TTL of a record, a domain name with low TTL might be less important as its ranking suggests since it would be queried over and over again due to the rapidly expiring TTL. Thus we need a metric that ignores the TTL. The solution is to count the unique IP addresses that query a certain domain. This way the number of times a single source queries a domain name has no influence on its ranking. Additionally, to limit the influence of daily fluctuations the numbers are measured over a months period and hence published monthly on our website. Note that we do not exclude NXDOMAIN, i.e. the list might include domains that do not exist yet get queried a lot. The table below shows the top ten domains during the month of June including the number of distinct resolvers that queried those domains.
By publishing the list of top 1000 domain names we hope to contribute to the idea of open data. Obviously, the published list raises questions, e.g. why some of the domains are ranked that high, which is why we ourselves will be conducting further analysis and on the other hand hope to see others make use of this ranking and publicly share their insights in order to broaden our understanding of the .ch landscape. In the near future we plan on doing some additional analysis regarding the security of those top 1000 domains.