On July 10th Architelos released the first NameSentry Report, benchmarking abuse levels in the domain name industry. For some time now, a debate has raged about the potential impact of new gTLDs on Internet safety and security, namely abusive registrations such as phishing, spam, malware, and so on. However, without benchmarking the current state, how can we realistically evaluate if new gTLDs have made any measureable difference in the level of abuse?
The goal of the report was to establish a way to measure the level of abuse in the domain name space as a whole and across the top TLDs, in order to bring some transparency and encourage discussion and debate on what factors if any result in a safer namespace. The report is self-explanatory and can be downloaded from our site. The goal of this blog is not to reiterate the report findings, but to provide additional detail to the methodology we employed.
Before we get to the methodology, its important to note:
- The NameSentry Report is the result of analysis of existing data from respected sources. The data comes from sources such as SURBL, Spamhaus, Internet Identity, ZeusTracker, Spyeye Tracker, Malware URL, and Malware Domain List. All of these data feeds/sources are widely available, and are trusted to measure and block abuse in thousands of enterprises worldwide.
- Our analysis measures how many of a TLD’s domains are being blocklisted as dangerous. This is an important and objective measure of trust and verifiable problems. It’s also reflective of an important but unfortunate truth, which is that by the time a domain gets blocklisted, usually some harm has already taken place. How well or how fast those problems are mitigated is a separate (and interesting) matter, but one that requires hard-to-come-by additional data and we did not attempt to tackle it in this first report.
- “Abuse” in the context of the report is defined as phishing, malware, and domains advertised in spam.
- The data overwhelmingly contains domains registered with bad intent. A small percentage were registered to innocent registrants who had their servers hacked into. Often the same domain is listed by multiple sources, and/or is associated with multiple types of abuse. To avoid duplication, we simply counted the number of unique domains listed as abusive, be they 2nd level or delegated 3rd level registrations, that were flagged for at least one type of abuse by at least one source.
Principles and Goals
The first step was to establish some principles and goals from which we could then derive a methodology for analysis. Some of our key principles and goals were:
- Fairness: Since the data was already available and created by leading authorities (see point #1 above), the key here is not the newness or availability of the data, but rather the fairness of the analysis. Therefore, all TLDs should be subject to the same measurement in evaluating the quality of their namespace vis-à-vis abuse.
- Clarity: see points 3 and 4 above
- Transparency: The report was published and available to all TLD registries on the same date. There were no previews of the analysis or report content with any TLD registries. This ensured that no one had any advantage or disadvantage, regardless of their rating. Everyone received the same information once the report was published.
- Precedent: This kind of rating has long been used to measure abuse levels in ASNs, ISPs, and networks. It’s been used to measure the prevalence of phishing in TLDs too. Our report is a basic benchmarking along those established lines.
- Timeliness and Specificity: The report’s analysis is based on data from January 1 to May 31 2013. This span of time is recent enough to be relevant and long enough to support analysis of trends.
- Comprehensiveness: In order to apply a measure to the quality of the Internet namespace as a whole, we needed to account for over 99% of Internet domains. There are 257 million domains registered in over 300 TLDs, and we wanted to include all TLDs with over 100,000 Domains Under Management (DUM). This means our report had to focus on the largest 72 TLDs, which together account for 99% of the world’s domains.
The TLDs that comprise the Internet namespace vary widely in size. One challenge was to find the means to measure and compare TLDs with multi million domains under management with much smaller TLDs. Using actual TLD size and measuring abuse in absolute numbers would only provide part of the information, but would miss the larger picture. We settled on the choice of a logarithmic sale. A logarithmic scale can be helpful when the data covers a large range of values. Plus, the use of the logarithms of the values rather than the actual values reduces a wide range to a more manageable size. As a result, abuse was measured as “abuse-per-million” or specifically the number of unique 2nd level or 3rd level domains that were flagged for at least one type of abuse per million domains under management. This is similar to “part-per-million” which is one the most commonly used terms to describe very small amounts of contaminants in our environment. “Abuse-per-million” and “part-per-million” are measures of concentration, the amount of one material in a larger amount of another material. Using the logarithmic scale also allowed us to apply the same scale of measurement to each and every TLD, and indeed across the Internet namespace.
If the Internet has indeed become a utility that we rely on, then what measurements were already in use for other utilities such as air and water? Air Quality Index and Water Quality Index were easily communicable measurements that have been successfully used to evaluate relative safety and quality based on contaminants or pollutants on a parts per million basis. Assuming abusive domains are similar to pollutants in any given namespace, then a similar quality index could be used. We established the term “Namespace Quality Index” (NQI) since the same method of analysis and measurement could be applied to any given namespace: the Internet as a whole, any gTLD or ccTLD, any portfolio of domain names registered by a registrar, etc.
We are pleased that the NameSentry Report has not only provided a means of measuring and evaluating any changes to the Internet namespace going forward (i.e. new gTLDs) but that it has also generated debate and discussion about what factors or combination of factors would lead to a high quality or “Green” NQI. We believe that there is no one lever (such as price, restrictive registration policies, or aggressive takedown policies) that achieves this result, but rather a careful calibration of multiple levers. The next NameSentry report will focus on drawing correlations between the various levers and outcomes to identify potential best practices that can be successfully employed by existing and new gTLDs. In the meantime we are focused on enhancing our analysis to better serve the community and therefore very interested in constructive feedback and critique. Please contact us at [email protected] or send us your questions via our “contact us” page.
By Alexa Raad, CEO of Architelos. Architelos provides consulting and managed services for clients applying for new top-level domains, ranging from new TLD application support to launch and turnkey front-end management of a new TLD. She can be reached directly at [email protected].