• Compiling Localizable Objects into Native JavaScript

    August 25th, 2009 by seth bindernagel with 5 comments »

    As promised, here is the second post from Jeremy Hiatt’s work on our l20n project.  This is a word-for-word reposting of his essay about compiling localizable objects in native JS.

    ====================================

    One of the goals for my summer internship is to improve performance of l20n. The initial implementation was a parser written entirely in JavaScript that operated on .lol files. For more details about our choices for file formats, see my previous post. After some failed attempts to rework the parser’s use of regular expressions that regressed performance, I experimented with JSON as an alternative file format. The hope was that we could leverage the performance of Gecko’s built-in JSON parser to speed up l20n. We did see some tremendous improvements: on a large testcase constructed from browser.dtd, JSON cut our parsing time from ~140 milliseconds down to just a few ms. Unfortunately, we were still slow when it came to evaluating and displaying all those entities. We still had a big chunk of parsing left that we couldn’t outsource to JSON. Each string value in l20n may contain variable placeholders. Here’s an example (in JSON):

    "droponbookmarksbutton" : {
        "value" : "Drop a link to bookmark it"},
    
    "popupWarning" : {
        "value" : "${brandShortName}s prevented this site
                  from opening a pop-up window."}

    (Line breaks inserted for clarity.) The first string doesn’t use any variables, but the second does. In order to catch all these placeholders, we scanned each string with a regular expression to match the ${…}s syntax, even though many strings don’t use any variables. That translated to a linear traversal of every single string before it could be returned, costing us a lot of time. In tests conducted in the xpcshell, rendering all the elements from browser.properties took roughly 40ms. In comparison, the current framework for properties files can parse and display all the elements in under 20ms. Since we can’t afford to regress overall performance, that meant we still had work to do to get faster.

    One way to eliminate checking every single string is to add extra information to the encoding for strings. Many languages define different behavior for single- vs. double-quoted strings, performing replacements in one but not the other. We could also have added a special flag to indicate simple (no replacements) vs. complex strings. Either of these approaches would have added further complexity to the localization process, so we did not seriously consider this approach.

    Instead, on the advice of the brilliant Staś Małolepszy, we embarked on an experiment to compile our l20n objects into native JavaScript. As a result, we saw another impressive performance jump. In an xpcshell test, we can load and display all of browser.properties in roughly 4ms (an order of magnitude improvement!). Here’s what our previous example looks like as compiled JavaScript:

    this.droponbookmarksbutton="Drop a link to bookmark it";
    this.__defineGetter__("popupWarning",
      function() { return "" + (brandShortName) +
        " prevented this site from opening a pop-up window.";});

    Another great thing about compilation is that our runtime performance doesn’t depend on our choice of source file format. Here’s a diagram showing the different ways an l20n file can get inflated into a localization context:

    l20n compilation schemeInflating l20n source into a context

    The performance numbers were collected using nsITimelineService in the xpcshell. The l20n runtime infrastructure can inflate a source file directly into a context, or it can load compiled JavaScript definitions for a significant performance boost. For comparison, here’s a diagram of Mozilla’s current l10n scheme:

    Current l10n schemeCurrent l10n scheme

    Again, this time was measured in the xpcshell when loading the browser.properties string bundle. It’s not necessarily representative of performance for DTD files as well. As we can see, compilation now guarantees at least comparable performance to the current approach, no matter what file format we end up using. If you’d like to weigh in on that debate, please leave a comment on my previous post! And finally, we are also working on l20n support in Silme so that it will be easy to migrate existing DTD/.properties files to our new l20n format.

    Intercompatibility with SilmeIntercompatibility with Silme

    Silme will serve as a critical compatibility layer to ensure a smooth transition to our new l10n framework. Please let me know if you have any questions or comments!