Compiling Localizable Objects into Native JavaScript

August 25th, 2009 by seth bindernagel

As promised, here is the second post from Jeremy Hiatt’s work on our l20n project.  This is a word-for-word reposting of his essay about compiling localizable objects in native JS.

====================================

One of the goals for my summer internship is to improve performance of l20n. The initial implementation was a parser written entirely in JavaScript that operated on .lol files. For more details about our choices for file formats, see my previous post. After some failed attempts to rework the parser’s use of regular expressions that regressed performance, I experimented with JSON as an alternative file format. The hope was that we could leverage the performance of Gecko’s built-in JSON parser to speed up l20n. We did see some tremendous improvements: on a large testcase constructed from browser.dtd, JSON cut our parsing time from ~140 milliseconds down to just a few ms. Unfortunately, we were still slow when it came to evaluating and displaying all those entities. We still had a big chunk of parsing left that we couldn’t outsource to JSON. Each string value in l20n may contain variable placeholders. Here’s an example (in JSON):

"droponbookmarksbutton" : {
    "value" : "Drop a link to bookmark it"},

"popupWarning" : {
    "value" : "${brandShortName}s prevented this site
              from opening a pop-up window."}

(Line breaks inserted for clarity.) The first string doesn’t use any variables, but the second does. In order to catch all these placeholders, we scanned each string with a regular expression to match the ${…}s syntax, even though many strings don’t use any variables. That translated to a linear traversal of every single string before it could be returned, costing us a lot of time. In tests conducted in the xpcshell, rendering all the elements from browser.properties took roughly 40ms. In comparison, the current framework for properties files can parse and display all the elements in under 20ms. Since we can’t afford to regress overall performance, that meant we still had work to do to get faster.

One way to eliminate checking every single string is to add extra information to the encoding for strings. Many languages define different behavior for single- vs. double-quoted strings, performing replacements in one but not the other. We could also have added a special flag to indicate simple (no replacements) vs. complex strings. Either of these approaches would have added further complexity to the localization process, so we did not seriously consider this approach.

Instead, on the advice of the brilliant Staś Małolepszy, we embarked on an experiment to compile our l20n objects into native JavaScript. As a result, we saw another impressive performance jump. In an xpcshell test, we can load and display all of browser.properties in roughly 4ms (an order of magnitude improvement!). Here’s what our previous example looks like as compiled JavaScript:

this.droponbookmarksbutton="Drop a link to bookmark it";
this.__defineGetter__("popupWarning",
  function() { return "" + (brandShortName) +
    " prevented this site from opening a pop-up window.";});

Another great thing about compilation is that our runtime performance doesn’t depend on our choice of source file format. Here’s a diagram showing the different ways an l20n file can get inflated into a localization context:

l20n compilation schemeInflating l20n source into a context

The performance numbers were collected using nsITimelineService in the xpcshell. The l20n runtime infrastructure can inflate a source file directly into a context, or it can load compiled JavaScript definitions for a significant performance boost. For comparison, here’s a diagram of Mozilla’s current l10n scheme:

Current l10n schemeCurrent l10n scheme

Again, this time was measured in the xpcshell when loading the browser.properties string bundle. It’s not necessarily representative of performance for DTD files as well. As we can see, compilation now guarantees at least comparable performance to the current approach, no matter what file format we end up using. If you’d like to weigh in on that debate, please leave a comment on my previous post! And finally, we are also working on l20n support in Silme so that it will be easy to migrate existing DTD/.properties files to our new l20n format.

Intercompatibility with SilmeIntercompatibility with Silme

Silme will serve as a critical compatibility layer to ensure a smooth transition to our new l10n framework. Please let me know if you have any questions or comments!

Tags: , , , , | Categories: Uncategorized

  1. You could preprocess the JSON objects and insert flags to indicate whether a string has a variable in it… that would save you from doing unnecessary regular expression tests at run-time.

  2. If you are still using regular expressions in a performance sensitive context, please point them out and I’d be happy to help optimize them.

  3. Jeremy Hiatt

    Thanks to both of you for your suggestions.

    @Laurens: We considered doing something like that, which would certainly help. However, as long as we’re doing a preprocessing step, we might as well do as much as we possibly can to improve performance at runtime, and we got a huge boost by dropping regex replacement entirely. TraceMonkey (Mozilla’s new JS engine) currently can’t handle nontrivial regex replacement in an optimal manner, though hopefully this bug will get fixed soon. In our case, converting the replacement operation to a string concatenation made our code about 40x faster in the evaluation step.

    @Daniel: Thanks for the offer! As I mentioned above, we don’t have performance-critical regex operations any more. That said, it would be useful to make our l20n “interpreter” faster.

    The regex to match our expander syntax looks like this:
    /\$\{([^\}]+)\}(i|s)/

    It gets used as follows:
    complex_string_to_expand.replace(/\$\{([^\}]+)\}(i|s)/g, replacement)

    where ‘replacement’ is a function to evaluate expressions and look up entity references that appear inside the curly braces.

    If you see any room for improvement, please let me know!

  4. I wonder if we could just write a native LOL parser based on the native JSON parser we have – i.e. do real (token-based?) parsing of the LOL instead of just regular expressions.

  5. Re native LOL parser, there is a rather strong force within our platform drivers to not use native code unless ultimately needed. Both for attack surface and for maintainance.