Monthly Archives: January 2010

My Contribution to Firefox 3.6

Firefox 3.6 is the first Firefox release that has my code in it.  Much of it was written over six months ago, so it’s nice that it’s finally seeing the light of day.  It’s also fun to know that my code is running on the machines of my friends and family;  that’s new for me.

One friend asked what my specific contribution to the release was.  I said “in some cases it might be a little faster and a little less likely to crash”.  At the December all-hands, when people asked me what I’ve been working on, I said “filing the sharp edges off Nanojit“.  Nanojit is the JIT compiler back-end used by TraceMonkey;  it translates a low-level intermediate representation called LIR into native code.  It has some clever features and does a pretty good job in many respects but a year ago a lot of its code was, well, awful.  It’s now much better:  simpler, faster, with fewer ways to go wrong, more internal sanity checking, and better testing (including some fuzzing).  And many of these improvements didn’t make it into 3.6.

I have a few more fixes to make, but Nanojit improvements are winding down, and my attention will soon be focused elsewhere within TraceMonkey.  I thought that I would be spending some time this year improving the quality of Nanojit’s code generation, but I’ve found that the quality is already pretty good.  This perhaps isn’t so surprising because LIR is so low-level that it’s pretty close to machine code.  However, while determining that I found that the LIR generated by the TraceMonkey front-end is often awful.  It looks like there is some fat to trim there!

Safer refactoring in Nanojit

Recently I’ve been making a lot of refactoring changes to Nanojit that I think of as “code hygiene” — changes that clean up the code, but don’t change its behaviour at all.  (Example 1, example 2.)  When making changes like these I like to do the changes carefully and gradually to make sure that I don’t affect behaviour unintentionally.  In Nanojit’s case, this means that the generated code shouldn’t change.  But it’s always possible to make mistakes, and sometimes a particular change is disruptive enough that you can’t evolve the code from one state to another without breaking it temporarily.

In such cases, if the regression tests pass it tells me that the new code works, but it doesn’t tell me if I’ve managed to avoid changing the behaviour as I intended.  So I’ve started doing comparisons of the native code generated by Nanojit for parts of SunSpider using TraceMonkey’s verbose output, which is controlled with the TMFLAGS environment variable.  For such changes, the generated code should be identical to an unchanged version.  The way I organise my workspaces is such that I always have an unchanged repository available, which makes such comparisons easy.

Except, the generated code is never quite identical because it contains addresses and the slightest change in the executable can affect these.  So I run the verbose output through this script:

perl -p -e 's/[0-9a-fA-F]{4,}/....../g'

It just replaces numbers with four or more digits with “……”, which is enough to filter out these address differences.

I tried this technique on all of SunSpider, but it turns out it doesn’t work reliably, because crypto-aes.js is non-deterministic — it contains a call to Date.getTime() which introduces enough non-determinism that TraceMonkey sometimes traces different code.  A couple of the V8 benchmarks have similar non-deterministic behaviours.  Non-determinism is a rather undesirable feature of a benchmark suite so I filed a SunSpider bug which is yet to be acted on.

Fortunately, experience has shown me that I don’t need to do code diffs on all of SunSpider to be confident that I haven’t changed Nanojit’s behaviour — doing code diffs on two or three of the bigger benchmarks suffices, as enough code is generated that even small behavioural differences are quickly found.

In fact, this code diff technique is useful enough that I’ve started using it sometimes even when I do want to change behaviour — I split such changes into two parts, one which doesn’t change the behaviour and one which does.  For example, when I added an opcode to distinguish between 64-bit integer loads and 64-bit integer floats I landed a non-behaviour-changing patch first, then did a follow-up change that improved x86-64 code generation.  This turned out to be a good idea because I introduced a regression with the follow-up change, and the fact that it was separated from the non-behavioural changes made it easy to determine what had gone wrong, because the regressing patch was much smaller than a combined patch would have been.

I also recently wrote a script to automate the diff process, which makes it that much easier to perform, which in turn makes it that much more likely that I’ll actually do it.

In summary, it’s always worth thinking about new ways to test your code, and when you find a good one, it’s worth making it as easy as possible to run those tests.