AutoCADT

February 25th, 2007

That’s a colloquial term for deploying the CADT development paradigm all by yourself. In otherwords, the sheer size of the Mozilla project is starting to get to me (after two years), and I am looking at ways of automating everything I can think of, in parallel. That exposes race conditions in my brain, leading to silent data loss. Sometimes, it results in good stuff, like tests that run after every check in. Sometimes, it’s… less fruitful.

My latest toy is Heritrix. This is the crawler behind the Wayback Machine. It’s great for downloading the Internet. The background task it’s running seems to be crawling tons of websites and translating the links into local URLs as it archives the downloads as gzip files.

screenshot of heritrix web interface

You give it some seed URLs and it crawls from there. Here’s what I fed it:

  • http://technorati.com/
  • http://www.csszengarden.com/
  • http://developer.yahoo.com/yui/
  • http://www.livedoor.com/
  • http://www.aljazeera.net/
  • http://www.lemonde.fr/
  • http://dojotoolkit.org/

I think some of these will be more fruitful than others, but you never know. The idea is that I’ll have them archived for serving with the wayback open source server, and I’ll be able to record whether any URLs crash Firefox. Additionally, I’ll be able to take snapshots of each page with different Firefox versions, and use a nifty PIL script to find the snapshots that exhibit the biggest differences.

import ImageChops
import math, operator

def rmsdiff(im1, im2):
    "Calculate the root-mean-square difference between two images"

    h = ImageChops.difference(im1, im2).histogram()

    # calculate rms
    return math.sqrt(reduce(operator.add,
        map(lambda h, i: h*(i**2), h, range(256))
    ) / (float(im1.size[0]) * im1.size[1]))

I’ll probably use a cheap hash function to avoid running this on screenshots that are identical, if that turns out to be a problem.

One Response to “AutoCADT”

  1. Observer Says:

    Or, shorter:

    import ImageStat, ImageChops

    def rmsdiff(im1, im2):
    return ImageStat.Stat(ImageChops.difference(im1, im2)).rms