engineering


8
Sep 11

Understanding DNT Adoption within Firefox

UPDATED 2011-09-08 11:55am PST: changed the description of how we store and retain IP address to be more accurate

On March 23rd, Mozilla launched its newest and most awesome browser: Firefox 4. Along with a plethora of features, including faster performance, better security and the whole nine yards, Firefox 4 included a cutting edge privacy feature called Do No Track (DNT). For the uninitiated, DNT simply tells sites “I don’t want to be tracked” via a HTTP header visible to all advertisers and publishers.
Mozilla’s new Privacy Blog  has several posts on the feature, including a new one today releasing a Do Not Track Field Guide for developers. Based on our current numbers, we’ve been seeing for several weeks now just under 5% of our users with DNT turned on within Firefox.
The Mozilla team is all about experimenting. We love innovating new technologies that do good and benefit the community as a whole. As Firefox 4 kept breaking records, the peak was 5,500 downloads/minute, (source: http://blog.mozilla.com/blog/2011/03/25/the-first-48-hours-of-mozilla-firefox-4/) we felt that it would be important to understand whether people were enabling DNT. Every Metrics guy lives and dies by the data. In late 2010, the metrics team gave a small talk on how we collect log data (click here for the video ppt). While that project has gone multiple iterations over time, the basic premise is still the same:

  • Grab logs from multiple data-centers.
  • Split out anonymized and non-anonymized data into two separate files
  • Store both sets of files in HDFS
  • Create relevant partitions inside HIVE
  • Query the data.
  • Drool over the stats.

(Non-anonymized data such as IP address has a 6-month retention policy and is deleted on expiration)

We decided to follow the same approach for calculating DNT stats. Once every day, each Firefox instance pings the AUS servers with respect to its DNT status. The ping request looks something like this:
“DNT:-” User has NOT set DNT
“DNT:1″ User HAS set DNT and does *not* wish to be tracked.

Armed with the following data points, a simple HIVE query gives us DNT stats for a given day:

   SELECT ds, dnt_type,  count(distinct ip_address)  FROM web_logs WHERE (request_url LIKE ‘%Firefox/4.0%’ OR request_url LIKE ‘%Firefox/5.0%’ OR request_url LIKE ‘%Firefox/6.0%’) AND dnt_type != ‘DNT:1, 1′ AND ds = ‘$dateTime’ GROUP BY ds, dnt_type ORDER BY ds desc;

The above script is run on a nightly basis and the result is then plotted over a time graph, as included with this post.

One BIG caveat:

The DNT numbers are being undercounted, primarily because we use hashed IP address as proxy for counting a unique user. This means, while there can be multiple users behind a given NAT with DNT set, the counter is incremented only once. This may account for why our numbers are a bit lower than those being reported by other groups, including the recent study of 100 million Firefox users conducted by Krux Digital

Possible Fix, NOT:

While it is possible to uniquely identify each instance of browser, doing so will require that we start tracking users, thereby defeating the exact purpose for why DNT was created in the first place.

 

Feel free to leave us a comment or email: (aphadke at_the_rate mozilla dot com – Anurag Phadke) for more information.