For a long time when our unit tests or Talos performance tests encountered a crash, the result was nothing but frustration. If you were lucky, you could tell that it crashed, but you had no idea where. Poor Blake spent weeks tracing down a crash from his speculative-parsing patch that only seemed to occur on Talos. Up until recently I figured the only way to make this happen was going to involve a fair amount of work that only I was going to be able to do. A few weeks ago it was determined that this was becoming a significant impact on development, as patches would get checked in, cause a crash and be backed out, leaving the developer with nothing to go on.

Benjamin Smedberg has been hard at work making it possible to get stacks in this situation, using the same Breakpad utilities we use on our Socorro server, but locally on the machine running the tests. Practically all of the pieces were in place this afternoon when #developers cornered Alice and closed the tree while she landed the final patch to make Talos produce stack traces. Boris then committed a test crash, and as a result we were able to see crash stacks in Mochitest (OS X, Linux) as well as Talos (OS X, Linux).

Thanks to Benjamin for doing most of the heavy lifting here, and for
Alice for taking the Talos part across the finish line. The Talos work
was mostly in bug 480577, and the unit test work was bug 481732. Note
that currently this only works in Mochitest (all 4 varieties), it will
work in Reftest/Crashtest after bug 479225 is fixed (which should be soon).

(Cross posted in dev.tree-management, but posting here for a wider audience.)

Unit tests: now with less suck!

November 21st, 2007

Thanks to the combined efforts of a few people, the Tinderbox build logs for our unit test machines now suck much less.  You can now click on “View Brief Log” and get a summary of test failures right at the top, instead of searching through the full log for various failure strings.  In addition, if you click down to the errors in the body of the log, the test files are linkified to bonsai for you.  Awesome!

RLk:0B (and staying that way)

October 24th, 2007

So, some time ago dbaron got RLk down to 0 bytes on our leak test box.  Sometime after that, we deployed the new Linux reference platform, only to have that go back up to 8 bytes.  Turns out it was my fault, a string wasn’t being freed in the crash reporter code.  The crash reporter must not have been enabled on the previous reference platform.  I’ve made amends and fixed this, and I also checked in rhelmer’s patch to make the leak test boxes turn orange if RLk goes above zero, so we should be able to hold the line on this per our test failure policy.  For comparison, on the 1.8 branch we leak up to 45KB(!) per test run.