Improving LOL

August 28th, 2009 by seth bindernagel

One more post coming from l10n intern, Jeremy Hiatt.  The following word-for-word post describes his work to improve the format of LOL, making it more readable and understandable for the developers and localizers who might use it.

————————————-

Today I gave a presentation to some of the guys from the platform team about the state of l20n. In my previous posts I’ve blogged about the advantages and drawbacks for each of the formats we’ve considered, and I got some more good feedback about that in today’s brownbag. After the talk, I chatted with fantasai about how we could improve the LOL format to made it more readable and understandable. She had some interesting ideas that I’d like to share.

Dropping Angle Brackets

First, she (and a few others) pointed out that angle brackets make LOL look like XML. However, this resemblance might be confusing since LOL is otherwise nothing like XML. The intention for angle brackets in LOL was to delimit entity definitions and give clear visual separation. These cues are helpful for our parser, especially when it comes to error recovery. In an error case, the parser can drop all tokens until it recognizes an opening bracket (<) and resume the parse. If you have suggestions for implementing effective error recovery if we do remove the brackets from the syntax, please leave them below.

Encoding Properties

Another potentially confusing aspect of LOL files is that an entity may have properties defined in addition to its value. Here’s our usual example of a noun with a specified gender:

<appName: "Jägermeister"
 gender: "male">

This can be disambiguated slightly with indentation, but fantasai noted that the current syntax does little to explain the difference between the first assignment (which is specifying appName itself), and the second assignment to the appName.gender property. She suggested a syntax that differentiates assigning the value from assigning properties: use ‘=’ for the first assignment, and curly braces to delimit additional properties. Here’s the same example from above:

appName = "Jägermeister" {
    gender: "male" }

In this format, LOL would look a lot like CSS.

Indexing

An entity that mentions a variable gendered noun may define different forms for different genders. For example:

<complex[appName.gender]: {
	male: "Ein hübscher ${appName}s.",
	female: "Ein hübsches ${appName}s."}>

In the current syntax, square brackets following the entity key denote the index used to select the proper form. The suggestion was to move that to the RHS of the assignment:

complex = [appName.gender] {
        male: "Ein hübscher ${appName}s.",
	female: "Ein hübsches ${appName}s."}

If you’re familiar with a switch statement in programming, you’ll probably notice that we basically adopted the standard syntax, but substituted square brackets for the switch( ) keyword.

Objects with Multiple Attributes

Objects in the UI, such as buttons, typically have a “label” and “accesskey” attribute but no canonical string value. This is subtly distinct from the cases above, where in the first case we wished to specify additional properties, and in the second the string value was resolved based on an external index. Example time:

<button: {value: "Push me", accesskey: "p"}>

In this case, it doesn’t make sense to refer to just “button”: you want either the label or the accesskey, which are available through the “.” accessor (e.g. button.label). To draw attention to this distinction, we could require a syntactic difference, or we could simply omit the index from the switch syntax above.

Summary

There are plenty more features of l20n that I’d love to put under the microscope here, but in the interest of focusing the discussion I’ll add them to a future post instead. As always, please share your opinion. You can also find us on IRC if you’re looking to start a lively debate; just look for me (jhiatt), Pike, and gandalf. Thanks to everyone from the brownbag today, and thanks especially to fantasai for taking the time to help us out!

Tags: , , | Categories: Uncategorized

  1. The newly suggested code style looks a little awkward to read – though I understand the reason to move away from the angle brackets. Perhaps consider following basic pseudo-CSS grammar with something like:

    appName{
    string: “Jägermeister”;
    gender: “male”;
    }

    complex(appName.gender){
    male: “Ein hübscher ${appName}s.”;
    female: “Ein hübsches ${appName}s.”;
    }

    This way the code reads a little more naturally and regularly than having an = here and a : there.

  2. Actually, there is one case where we refer to just “button”: In XUL, where this notation should enable use to just not list accesskey explicitely but through the L20n object.

    In general, as noted on an earlier post, I think the LOL format is surely open to modification and not set in stone yet, and your changes surely look friendlier, and I’m not sure it’s really worse for a parser. I think we need something different than a dumb regexp parser in any case if we want to be performant.

  3. Robert, sorry, but I don’t get the first sentence at all. Can you rephrase that?

    Regarding your second comment, we kinda decoupled the question of performance and format by being introducing the compilation of lol into js, which solves more bottlenecks than just parsing, AFAICT right now.

  4. Again, I implore you, please move heaven and earth to avoid designing your own file format and syntax. Is it really the case that no-one in computing history has come up with a suitable format for encoding grouped sets of key-value pairs?

    “Among the hardest things to get right in designing any text file format are issues of quoting, whitespace and other low-level syntax details. Custom file formats often suffer from slightly broken syntax that doesn’t quite match other similar formats. Using a standard format …, which is verifiable and parsed by a standard library, eliminates most of these issues.

    – Keith Packard”

    However, if you are absolutely determined to roll your own, “The Art of Unix Programming” has some really good chapters on designing file formats, which are required reading.
    http://www.faqs.org/docs/artu/ch05s02.html

    Gerv

  5. Gerv, the format shootout is just a few posts back on Seth’s blog, http://blog.mozilla.com/seth/2009/08/24/the-l20n-format-shootout/ (cross posted from Jeremy again).

    As you can see, we did look at other popular formats, with significant drawbacks.

  6. If we are set on LOL, I think it would be great for someone to look at the current format in the light of the issues raised in the chapter from The Art of Unix Programming.

    Gerv

Leave a Reply