<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>seth's blog &#187; JSON</title>
	<atom:link href="http://blog.mozilla.com/seth/tag/json/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.com/seth</link>
	<description>localization and community at mozilla</description>
	<lastBuildDate>Mon, 04 Oct 2010 14:57:57 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Compiling Localizable Objects into Native JavaScript</title>
		<link>http://blog.mozilla.com/seth/2009/08/25/compiling-localizable-objects-into-native-javascript/</link>
		<comments>http://blog.mozilla.com/seth/2009/08/25/compiling-localizable-objects-into-native-javascript/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 15:59:39 +0000</pubDate>
		<dc:creator>seth bindernagel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[L20n]]></category>
		<category><![CDATA[planet]]></category>
		<category><![CDATA[tools]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/seth/?p=550</guid>
		<description><![CDATA[As promised, here is the second post from Jeremy Hiatt&#8217;s work on our l20n project.  This is a word-for-word reposting of his essay about compiling localizable objects in native JS. ==================================== One of the goals for my summer internship is to improve performance of l20n. The initial implementation was a parser written entirely in JavaScript [...]]]></description>
			<content:encoded><![CDATA[<p>As promised, here is the second post from Jeremy Hiatt&#8217;s work on our l20n project.  This is a word-for-word reposting of his essay about <a href="http://jeremyhiatt.wordpress.com/2009/08/22/compiling-localizable-objects-into-native-javascript/" target="_blank">compiling localizable objects in native JS</a>.</p>
<p>====================================</p>
<p>One of the goals for my summer internship is to improve performance of <a href="https://wiki.mozilla.org/L20n">l20n</a>. The initial implementation was a parser written entirely in JavaScript that operated on .lol files. For more details about our choices for file formats, see my previous post. After some failed attempts to rework the parser’s use of regular expressions that regressed performance, I experimented with JSON as an alternative file format. The hope was that we could leverage the performance of Gecko’s built-in JSON parser to speed up l20n. We did see some tremendous improvements: on a large testcase constructed from browser.dtd, JSON cut our parsing time from ~140 milliseconds down to just a few ms. Unfortunately, we were still slow when it came to evaluating and displaying all those entities. We still had a big chunk of parsing left that we couldn’t outsource to JSON. Each string value in l20n may contain variable placeholders. Here’s an example (in JSON):</p>
<pre style="border: 1px solid #dfecf1; padding: 10px; overflow: auto; color: #25221d; display: block; font-family: 'Courier New',monospace;">"droponbookmarksbutton" : {
    "value" : "Drop a link to bookmark it"},

"popupWarning" : {
    "value" : "${brandShortName}s prevented this site
              from opening a pop-up window."}</pre>
<p>(Line breaks inserted for clarity.) The first string doesn’t use any variables, but the second does. In order to catch all these placeholders, we scanned each string with a regular expression to match the ${…}s syntax, even though many strings don’t use any variables. That translated to a linear traversal of every single string before it could be returned, costing us a lot of time. In tests conducted in the xpcshell, rendering all the elements from browser.properties took roughly 40ms. In comparison, the current framework for properties files can parse and display all the elements in under 20ms. Since we can’t afford to regress overall performance, that meant we still had work to do to get faster.</p>
<p>One way to eliminate checking every single string is to add extra information to the encoding for strings. Many languages define different behavior for single- vs. double-quoted strings, performing replacements in one but not the other. We could also have added a special flag to indicate simple (no replacements) vs. complex strings. Either of these approaches would have added further complexity to the localization process, so we did not seriously consider this approach.</p>
<p>Instead, on the advice of the brilliant Staś Małolepszy, we embarked on an experiment to compile our l20n objects into native JavaScript. As a result, we saw another impressive performance jump. In an xpcshell test, we can load and display all of browser.properties in roughly 4ms (an order of magnitude improvement!). Here’s what our previous example looks like as compiled JavaScript:</p>
<pre style="border: 1px solid #dfecf1; padding: 10px; overflow: auto; color: #25221d; display: block; font-family: 'Courier New',monospace;">this.droponbookmarksbutton="Drop a link to bookmark it";
this.__defineGetter__("popupWarning",
  function() { return "" + (brandShortName) +
    " prevented this site from opening a pop-up window.";});</pre>
<p>Another great thing about compilation is that our runtime performance doesn’t depend on our choice of source file format. Here’s a diagram showing the different ways an l20n file can get inflated into a localization context:</p>
<div id="attachment_30" style="width: 310px;"><a href="http://jeremyhiatt.files.wordpress.com/2009/08/l20n-diagram.png"><img title="l20n-diagram" src="http://jeremyhiatt.files.wordpress.com/2009/08/l20n-diagram.png?w=300&amp;h=217" alt="l20n compilation scheme" width="300" height="217" /></a>Inflating l20n source into a context</div>
<p>The performance numbers were collected using nsITimelineService in the xpcshell. The l20n runtime infrastructure can inflate a source file directly into a context, or it can load compiled JavaScript definitions for a significant performance boost. For comparison, here’s a diagram of Mozilla’s current l10n scheme:</p>
<div id="attachment_35" style="width: 288px;"><a href="http://jeremyhiatt.files.wordpress.com/2009/08/dtd-props-diagram.png"><img title="dtd-props-diagram" src="http://jeremyhiatt.files.wordpress.com/2009/08/dtd-props-diagram.png?w=278&amp;h=300" alt="Current l10n scheme" width="278" height="300" /></a>Current l10n scheme</div>
<p>Again, this time was measured in the xpcshell when loading the browser.properties string bundle. It’s not necessarily representative of performance for DTD files as well. As we can see, compilation now guarantees at least comparable performance to the current approach, no matter what file format we end up using. If you’d like to weigh in on that debate, please leave a comment on my previous post! And finally, we are also working on l20n support in <a href="http://wiki.braniecki.net/Silme">Silme</a> so that it will be easy to migrate existing DTD/.properties files to our new l20n format.</p>
<div id="attachment_36" style="width: 310px;"><a href="http://jeremyhiatt.files.wordpress.com/2009/08/silme-conversion.png"><img title="silme-conversion" src="http://jeremyhiatt.files.wordpress.com/2009/08/silme-conversion.png?w=300&amp;h=224" alt="Intercompatibility with Silme" width="300" height="224" /></a>Intercompatibility with Silme</div>
<p>Silme will serve as a critical compatibility layer to ensure a smooth transition to our new l10n framework. Please let me know if you have any questions or comments!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/seth/2009/08/25/compiling-localizable-objects-into-native-javascript/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>The L20n Format Shootout</title>
		<link>http://blog.mozilla.com/seth/2009/08/24/the-l20n-format-shootout/</link>
		<comments>http://blog.mozilla.com/seth/2009/08/24/the-l20n-format-shootout/#comments</comments>
		<pubDate>Tue, 25 Aug 2009 00:49:24 +0000</pubDate>
		<dc:creator>seth bindernagel</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[JSON]]></category>
		<category><![CDATA[L20n]]></category>
		<category><![CDATA[planet]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/seth/?p=547</guid>
		<description><![CDATA[Jeremy Hiatt is our localization summer intern who has been doing some fantastic work to advance the conceptual idea of L20n into something more practical.  Below is a word-for-word copy of a post he made on his blog.  I am reposting his words to get more people reading what he has been working on.  Tomorrow [...]]]></description>
			<content:encoded><![CDATA[<div>
<div>
<p>Jeremy Hiatt is our localization summer intern who has been doing some fantastic work to advance the conceptual idea of L20n into something more practical.  Below is a word-for-word copy of <a href="http://jeremyhiatt.wordpress.com/2009/08/21/the-l20n-format-shootout/" target="_blank">a post he made on his blog</a>.  I am reposting his words to get more people reading what he has been working on.  Tomorrow will come a second repost about compiling localizable objects into native JavaScript.</p>
<p>&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;-</p>
<p>L20n (for localization, 2.0) aims to  empower localizers to describe complexities and subtleties of their  language: gendered nouns, singular/plural forms, and just about any  other quirk that might exist in the grammar. Like DTD and .properties  formats, which we currently use to encode localizable strings, l20n  objects associate entity IDs with string values. Localizers translate  these values into the target language. L20n has all the power of the  current framework, plus a lot more, and it’s just as simple to use  (provided we choose the right format!). You can find some examples of  l20n in action <a href="https://wiki.mozilla.org/L20n">here</a>. In the  past weeks, we’ve experimented with JSON (JavaScript Object Notation) as  a file format to represent localizable objects in hopes of achieving  better performance by leveraging the new built-in JSON parser in  Firefox. The performance gains were substantial, but still not enough to  compete with the current DTD/properties framework in terms of speed.  We’ve since adopted a new scheme to compile our l20n source files into  native JavaScript (credit to Staś Małolepszy for suggesting this). This  compilation now guarantees good performance independent of our choice of  source file format. I will discuss the specifics of compilation in an  upcoming post; this post will focus on the relative merits of the  various formats under consideration.</p>
<h3>Meet the contenders</h3>
<h4>LOL files</h4>
<p>Before experimenting with JSON, we developed a novel format for l20n,  playfully titled “localizable object lists” (.lol files). A lol file  looks like a hybrid of DTD and .properties formats, with entities  delimited by angle brackets and colons separating keys from values.  Here’s a simple example, constructed from brand.dtd:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">&lt;brandShortName: "Minefield"&gt;
&lt;brandFullName: "Minefield"&gt;
&lt;vendorShortName: "Mozilla"&gt;
&lt;logoCopyright: " "&gt;</pre>
<p>In this simple case, the lol file looks a lot like the original  brand.dtd, which looks like this:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">&lt;!ENTITY  brandShortName        "Minefield"&gt;
&lt;!ENTITY  brandFullName         "Minefield"&gt;
&lt;!ENTITY  vendorShortName       "Mozilla"&gt;
&lt;!ENTITY  logoCopyright         " "&gt;</pre>
<p>We lost the !ENTITY declaration and added a colon, but otherwise the  lol format should look familiar. What if we want to do something more  complex, like define an entity that involves a gendered noun? Here’s a  German example encoded in a lol file:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">/* This entity references a gendered noun */
&lt;complex[appName.gender]: {
    male: "Ein hübscher ${appName}s.",
    female: "Ein hübsches ${appName}s."}&gt;

/* This is a gendered noun */
&lt;appName: "Jägermeister"
 gender: "male"&gt;</pre>
<p>In the above example, we indicated the “complex” entity depends on  the “gender” property of the “appName” entity. The ${…}s expander within  the string is a placeholder that will be replaced with the value of  “appName” (Jägermeister). Note that we can insert comments in the  familiar /*…*/ style. If you’re curious to see more use cases for l20n  and the lol format, be sure to check out the link above to Axel’s  examples.</p>
<h4>JSON</h4>
<p>JSON is a well-known data exchange format. It’s simple to understand,  and with implementations available in many different languages, simple  to use. As mentioned above, our initial attempt to encode localizable  objects in JSON was motivated by performance concerns. Even without a  speed advantage, JSON still has some attractions, namely its existing  implementations. Our JSON-based l20n infrastructure leverages Gecko’s  built-in parser to do a lot of heavy lifting, meaning we have less code  to maintain on our part. Plus, arrays and hashes, the fundamental  datatypes available in JSON, are a natural fit for localizable objects.  Still, JSON has some serious shortcomings, which we will see shortly.</p>
<p>As mentioned above, JSON is great for describing key-value pairs.  Here’s the same brand.dtd example, now expressed in JSON:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">{"brandShortName" : {"value" : "Minefield"},
 "brandFullName" : {"value" : "Minefield"},
 "vendorShortName" : {"value" : "Mozilla"},
 "logoCopyright" : {"value" : " "}}</pre>
<p>Our localizable objects in JSON feature a “value” attribute denoting  the string to be displayed. This makes our JSON example slightly more  verbose, along with the required quotes surrounding the keys. Now here’s  the sample gendered-noun example from above, this time in JSON:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">{ "complex" :
    {"indices" : ["appName.gender"],
     "value" : { "male" : "Ein hübscher ${appName}s.",
                 "female" : "Ein hübsches ${appName}s."}},

  "appName" : {"value" : "Jägermeister",
                  "gender" : "male"}}</pre>
<p>In JSON, we need to reserve some keywords for attributes, like  “indices” here, to implement certain l20n features. Still, JSON works  pretty well to express the structure of the object. One area where JSON  doesn’t work so well is comments. In JSON, our top-level object is a  hash that associates entity IDs with their definitions. There are a few  apparent ways to integrate comments into this object:</p>
<ol>
<li>Assign each comment to the same identifier, e.g. “comment”.</li>
<li>Assign each comment to a unique identifier, e.g. “comment0″,  “comment1″, etc.</li>
<li>Don’t allow top-level comments: each comment has to be an attribute  of an entity</li>
</ol>
<p>Option 1 makes sense for humans writing JSON, and it’s valid, but a  little strange.<br />
Option 2 is a little painful when writing the file, especially when it  comes to inserting new comments. This scheme would make it possible to  reference specific comments, which might be useful.<br />
Option 3 is somewhat of a straw-man but still deserves some  consideration. Most comments in a localizable file give instructions for  how to translate a specific entity, and now that relationship would be  explicitly enforced. This form of comment is likely the best choice in  most situations, but it probably is too restrictive to make it the only  choice.</p>
<p>Another shortcoming in JSON is that it doesn’t support multiline  strings. This is a serious problem when it comes to presenting long  strings to localizers, since line breaks aren’t just for readability;  they also give important cues for localization about logical separation  between thoughts. As it turns out, the native JSON parser built into  Gecko is perfectly content to accept multiline strings, but most other  parsers will report an error.</p>
<h4>YAML: A better JSON?</h4>
<p>YAML is a data serialization language that is a superset of JSON. It  supports comments, multiline strings, and user-defined data types. On  the downside, it’s not nearly as well-known as JSON, it’s considerably  more complex, and it’s not already built in to the Mozilla platform.</p>
<p>Here’s our first example from above, now in YAML:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">brandShortName: Minefield
brandFullName: Minefield
vendorShortName: Mozilla
logoCopyright:</pre>
<p>And the second example:</p>
<pre style="border: 1px solid #dfecf1; color: #25221d; display: block; font-family: 'Courier New',monospace; overflow: auto; padding: 10px;">complex:
    indices: appName.gender
    value:
        male: Ein hübscher ${appName}s.
        female: Ein hübsches ${appName}s.

appName: {value: Jägermeister, gender: male}</pre>
<p>YAML has the same logical structure as JSON with a much cleaner look,  since indentation can be used instead of curly braces to denote  objects, and it doesn’t require strings to be delimited with quotes.  That’s another attractive feature, since it reduces the chance for  errors with improperly escaped quotes within strings, and missing  trailing quotes, that cause a lot of frustration. The less rosy side of  the picture is that we don’t have a YAML parser that we can simply drop  into place like we did with JSON, so it would require a lot of work on  our part to get it up and running. YAML does have a fair number of  implementations floating around, but development seems to have stalled  on many of these. For example, this <a href="http://sourceforge.net/projects/yaml-javascript/">JavaScript  implementation</a> hasn’t seen any updates in nearly 5 years.</p>
<h3>Summary</h3>
<p>So far we’ve seen examined three choices: LOL, JSON, and YAML. The  first was designed specifically for l20n, so naturally it encodes the  complex features of l20n most gracefully. The remaining two are  established formats with implementations available in many different  programming languages (JSON to a far greater extent than YAML). The lack  of comments and multiline strings is probably enough to eliminate JSON  from the discussion, since these deficits outweigh any advantage of  interoperability, leaving us with LOL and YAML. If you’d like to make a  case for one of these, or any other format dear to your heart, don’t  hesitate to leave a comment! We’d love to get your input.</p></div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/seth/2009/08/24/the-l20n-format-shootout/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
	</channel>
</rss>

