l10n merge
July 3rd, 2008
I’ve just pushed an implementation of l10n-merge to my tooling repository. It’s now actually just an option to compare-locales, and will do the weakest heuristic for now.
Whenever compare-locales finds missing entries in an existing file, it will create a copy of that file in a staging directory for the merge, and append the missing entries. For the bulk of our files, that should work fine. Bookmarks.html is an exception, as are the netError files, I think order of entities and dtd inclusions matters there. In the end, you get a staging directory with files that got fixed up, a localization directory with both good files and files with missing entries, and the original en-US source. Making jar.mn actually pick localized files up from three different places in a particular order is in my build-patches repository.
Here’s why, in case you wonder: First and foremost, it leaves the original source alone. I like it like that. Secondly, it does as few file manipulations as possible in the best case, a complete localization doesn’t do a single copy or something. It’s not all that invasive into the build system as one might think, too. At least as soon as you want to look at the code to see where you’re looking for files, checking a bunch of source base dirs is rather trivial.
I’m old
June 12th, 2008
While reverse-engineering make-jars.pl for bug 361583, I wondered why we did that weird locking in there.
Turned out that that was a fix I reviewed, for a bug I filed, 55174. 5 digit bugnumer, dating back to 2000. Those were the days of
make -j32 install
I bet picard got its plug pulled by now.
EUROPEADA 2008
June 2nd, 2008
Just wanted to point all of you into soccer and languages at the EUROPEADA 2008, that’s currently on. “The soccer tournament for the autochthonous, national minorities in Europe”, as they say themselves. There’s a video featuring some of the languages spoken on tagesschau.de (German).
L10n buildbots builds, the first quarter
April 14th, 2008
The current setup of my l10n buildbot is running for 3 months now, or a quarter. I figured that’d be a good time to do some stats.
In these three months, the l10n server ran about 10600 builds, 10416 of those on trunk. Of the latter, 2330 builds succeeded, 8086 failed (fun number pun). The total amount of build time during these three months was merely 3 days, the rest of the day it was just pounding bonsai-l10n and slacking.
The mean response time, that is, the time between the change that bonsai shows and the end of the build is 3-4 minutes, with the following distribution
The lags go as high up as almost 2 hours, the really big jumps seem to be problems on bonsai-l10n, there are a few builds taking 20-30 minutes due to slowness in cvs check-outs. I’m not showing some 36 builds in the diagram above. But the histogram shows nicely, you should in general be done within 10 minutes, and even if the build didn’t do langpacks on linux but full repacks on windows, I would expect a similar reponsiveness on comparable hardware to the l10n server.
The bad news is, the code is mostly non-reviewed, and should use a pending feature for buildbot, custom build properties. Otherwise, reconfig will never work.
# Localization note (netError.dtd): overrides
April 8th, 2008
I thought I knew what we did for netError.xhtml and netError.dtd, but I didn’t, so here’s a little thing I learned today:
netError.xhtml loads 3 DTDs, one for xhtml, one is netError.dtd (global) and one is global.dtd (for RTLishness). Two boring ones, one interesting. Now, what we used to do is that browser would override the global netError.dtd (which is in the dom module) with its own version. Not anymore, global netError.dtd now imports global’s netErrorApp.dtd and that is then overriden, by, guess, right, netError.dtd in browser.
Lesson learned: if you’re hunting parsing errors in Firefox net error pages, you have to check both netError.dtd files for syntax errors, but not netErrorApp.dtd.
Wisdom of machine translation
April 1st, 2008
At times machine translations achieve results that no human could word that well. This is what google language tools have to say about fixing stuff on short timelines:
We are using everybody’s frustrated, Sorry.
Found here.
In case you didn’t notice…
March 28th, 2008
I’ve been on vacation, I’m back. What’s the point? If you’re in to South Africa, Cape Town, Garden Route, and Fynbos, there are some 250 photos to go through.
Users and testers
March 27th, 2008
Trevor swiftly fixed bug 424993, so now I can talk about users and testers of Firefox 2 and 3, per locale, per platform. Thanks. This is data over the last 7 days, so it’s not dead-on representative, and it won’t cover new locales in B5, but only the end-game in B4. As we’re constantly ramping up in testers, that’s a good time area, and it will remove those users that just shortly checked B4 due to the media rush. Thus, in particular for localizations, it should be much more representative of “satisfied testers”.
As always, I’m giving you an exhibit of the data, not included here, as I had no idea how to make wp do that. Check the output on active testers, March 19th-26th.
There are some interesting artifacts. Like, I doubt we have really that much of an overrepresentation of linux testers, it’s just that most of the fx2s out there are distro fx2s that we don’t see. Fx3s are of course different. In average, we had a 14 per mille ratio between fx3 b4 users and fx2 users. There are only few localizations “overtested”, most prominently Gujarati, which is just a bad signal to noise ratio. We do have a few undertested ones, starting with Greek and Macedonian (coincidence?), both at 3 per mille. Low-noise high testing ratios come in for Korean (fair noise?), traditional Chinese, and Russian. Making the cut there somewhat arbitrarily.
What that gives? It gives us focus for our testing, and it’s a nice occasion to say:
Beta 5 will come out, with yet more localizations, so whoever you know that could enjoy a Firefox 3 Beta in localized version, send them over to the download page and ask them to send in their feedback.
Builds, shuttle busses and cabs
February 7th, 2008
There have been a few blog posts recently about when to do builds. I’d like to add a few thoughts of mine.
I’ll follow mostly two trains of thought
* each check-in raises a bunch of questions, to which we want answers — quick
* the road to these answers has a limited traffic capacity
The current proposals map to two images in my head, the current “build continuously” model is basically a shuttle bus. “Build on check-in” is more like a cab. Now, the interesting artifact in our picture is that both the shuttle bus and the cab have an unlimited capacity for passengers, or check-ins. Do check-ins blend? Yes, they do.
Now, the blending of patches has an upside and a downside. On the upside, it enables us to get around traffic jams. We can just transport as many check-ins as we get. The downside is, the answers that the builds and test runs give can’t be associated with individual check-ins anymore. Well, “passes” can, “failures” cannot. I’ll postpone perf testing here, 10% win, 9% loss, end up where you were.
There have been previous posts on whether shuttle buses or cabs are the way to go, and my answer is “neither”. I guess there is an easy answer if you assume that you have no limits on machines, in that case, just let them run on check-in. That’s great — at least as long as you can actually relate the resulting build, and the tests run on that build, to a source tree. Once we’re running out of machines, the story is a little different. Every machine should be continuously building then, and the trick is, ‘by then’. That is, each build should adjust the time it’s waiting for more check-ins such that, by the time the last idle machine kicks off, all available machine are fairly distributed across the ETA of the next machine. Let’s pick some arbitrary number for an initial stabilization time, Tmin. Waiting time for machine n of N could then be
Tmin * (N-n)/(N-1) + ETA/2 * (n-1)/(N-1)
if we choose to weight linearly. I did a little scatterplot game for you to pick different amounts of slaves, mintimes, etas and such. I bet there is a way to pick better values for Tmin and the power based on bonsai and tinderbox statistics.
Sadly, neither tinderbox nor buildbot offer this, but I could imagine that this would be of more general use to buildbot clients, and would be something to get upstream.
The other part of the picture is “did this particular change impact ???”. Now, for questions like “does this compile?”, the answer is fairly trivial. I think the same goes for things like unit tests or ref tests. As long as the current state of the tree passes, we’re fine. When it fails, that’s a more interesting question. Like, it might make sense to then actually refine the built source stamps.
But for questions regarding perfomance, there are worst case scenarios. Like, you see a 1% regression. Is it a 1% regression from patch 2, or is patch 2 actually improving performance, but patch 1 just totally borked it? On top of that, performance data is noisy data. There’s likely a good heuristic algorithm to distribute sampling of builds to measure performance on based on total count of tests run per build, age of the build, and current noise on the performance data for that build. So in particular for performance testing, it would be interesting to not just build the latest well-defined source state (thanks cvs), but also to be able to build previously not built source states, and in the performance architecture, to spend the available cycles to further refine the statistics on a range of recent builds, instead of just the latest. And to, of course, relate those data points with the source stamp that correlates to the build that was tested right now.
How is ab-CD doing?
January 28th, 2008
Just wanted to plot a brief update. I’m currently mostly working on improving our understanding of how localizations are doing for Firefox 3. The plan is to come up with a dashboard, gathering at least the quantifiable machine-readable information. The work there is improving nicely, you can see it at http://l10n.mozilla.org/dashboard/. There is still stuff to come, like, I’m likely going to make the Builds tab the default one, and I’m going to move away from red and green.
Now, moving away from red and green made a good jump right now, as I have made live an output that actually visualizes the status of the product localization over time. Right now, it’s aiming to do 30 days, but it doesn’t have enough stats for that just yet. I’m using Simile’s timeplot here, and exhibit for the index. To explain the graph (I suck at markup, so it’s a crufty legend right now), the red lines are missing entities, black is obsolete. Those two have their y-axis on the left. The grey beast is the unchanged entries, with the y-axis on the right. The blue ticks are actually changes by the localizer (not in en-US, the graph updates though), click on those to get the check-in messages, the committer is a link to a bonsai query for +/- half an hour. You can browse through the active l10n stats on http://l10n.mozilla.org/buildbot/statistics, which points to, for example, the Polish stats.
Next Page »
