Categories: developers

Statistics Update

Wil Clouser, our AMO Dev lead, has an update on our work with fixing our problems with statistics.

[Reprinted from his blog]

Add-on statistics have been intermittent for a couple months and are just recently getting the attention they need.

Our current process is to count download statistics once per day and update ping statistics once per week (update pings are a sampling of the complete set). The reliability of the script generating these statistics has been falling as our data size has grown and we’ve had several bugs filed regarding the numbers it’s produced. Most of the time they are relatively small fixes and the script continued to limp along.

Currently we’re facing questionable results in both sets of statistics (bug 468570 for update pings, bug 472538 for download counts). I’ve been debugging the update pings script and despite solving some problems we’re continuing to see the script fail to run properly.

Parallel to AMO development, Daniel Einspanjer has been working on a larger statistics parser that will aggregate data from many Mozilla sites into a dashboard with easy visualizations. It turns out he’s already processing the AMO logs and pulling out more data than us more often and in less time.

With a system like that available it doesn’t make sense for us to continue to develop (and, in this case heavily modify) our local statistics scripts. With that in mind, our next steps are:

  1. Verify the results we (used to) get with the AMO scripts match those of the new system
  2. Create a transformation script to push the data from Daniel’s project to the AMO database
  3. Turn off the AMO scripts
  4. Back fill statistics through at least November 15th, 2008 to replace our flailing stats. If the comparisons in step 1 reveal miscounting from before that we’ll back fill as far as we need to.

These steps will let us meet the immediate goal of getting the statistics we offer now to be reliable and complete. In the future we can look at pulling additional data from the new metrics system. The target date to switch to the new system is the end of next week, Jan 31 2009. Once we make the switch we can evaluate how long the parsing takes and give an estimate of how long back filling will take. As always, let me know if there are any concerns.