L20n

February 3rd, 2009 by seth bindernagel

I spent last week writing about some limitations with DTD files and promised to lead up with something about l20n.  In fact, a lot has been written about this concept and can be found here:

https://wiki.mozilla.org/L20n

The introduction from that page is particularly helpful in describing l20n.  It opens with the following:

“L20n is the codename for a localization architecture taking existing approaches one step further. The name stands for l10n 2. The architecture is laid out with Mozilla applications in mind, but should be applicable to other areas as well. As for Mozilla, Mozilla 2 will give us a chance to implement significant changes in our l10n architecture, and this is one attempt to do that.”

It may come as no surprise, but Axel is mostly responsible for this TERRIFIC introduction and write up.  The wiki covers:

I really suggest everyone take some time to read through this wiki to get a better sense of what l20n is and how to contribute.  Rather than try to rewrite something that is already superb, I highly suggest reading the background document linked to above (and again just now).

Let’s give a taste of just what l20n might be one day.  Here is some sample code from Axel that helps with plural forms:

<plural: (n) -> {n != 1}>
/**
 * Complex string
 * @param: beers
 */
<axel: "Axel had ${beers}i ${axel.bottles[plural(beers)]}s of beer." bottles: ["bottle", "bottles"]>

And the translation by a localizer:

<plural: (n) -> {n != 1}>

/**
 * Complex string
 * @param: beers
 */
<axel: "Axel hat ${beers}i ${axel.bottles[plural(beers)]}s Bier getrunken."
 bottles: ["Flasche", "Flaschen"]>

The nice thing here is that the code is flexible to provide for multiple plural forms.  Ping if you have a question.

Some really excellent demos are on the Examples link from above.  I will link to a few I liked here:

These are great because they show the power and flexibility of the code with a straightforward UI to go along.

There is so much fun to be had with l20n, I hope you take the time to go through this wiki, comment on it, blog about it, email me, etc.  We are not ready to implement l20n, but we are certainly ready to discuss.

Thank you to Axel and Gandalf for helping me with these posts.  Most of what I write is simply me learning new things and then rewriting.  I hope you can play along and get involved in the next generation of Mozilla l10n.

Tags: , , | Categories: L20n

  1. Sorry Seth but I’m still not jumping over the moon on this one.

    Can we honestly expect a localiser to be able to compose a string like this:

    Can you understand it without having to sit and think for a while. I’m a programmer and I have to think through that one to work out what it does. Then I’d still need to work out how I would translate it. And lastly I’d probably break the syntax somehow and break Firefox.

    Just looking at the plural problem, this solutions ranks as the most complex I’ve seen in this domain.

    Yes it looks powerful, yes the demo’s show it working well, no I don’t want to discourage some innovative ideas. But we can’t make things infinitely harder for localisers in order to solve a problem like this.

    An anti-complexity agenda (which would include opt-in) to me should be a fundamental requirement for this framework if it is to make l10n better.

  2. Jorge

    I know this is a more obscure localization issue, but I would like to know if you are planning to tackle it with the L20n proposal. This is something we found while developing the Fire.fm extension.
    We have a string like this:
    Play %S’s similar artists.
    Now, most languages don’t use the ‘s, but both English and German have something like this, and both have special rules when *not* to use it. So, in English, words ending in s will only have ‘. In German, instead of ‘s the just use s, and they just skip it when the word ends with an s sound.

    @Dwayne: I think all of this is the developer’s choice, so you can keep it simple, albeit flawed, if you want.

  3. @ Dwayne: I am surprised these ideas didn’t launch you to the moon! :)

    I hope you can see that this is all part of a larger vision. The point of this whole discourse has been to discuss ways that Mozilla l10n can improve, which is, in some critical ways, a developer issue.

    The point of l20n is to make life better for localizers, developers, translators…pretty much everyon.

    So, let’s imagine this:

    1) For the first time at Mozilla, l20n allows for most languages forms and rules to work. I’ve pointed out a lot of the challenges over the past week. The examples above are very technical, but they solve issues that have made proper translation very difficult.

    2) Localization libraries like Silme add to the power of l10n, allowing localizers to now find what has changed, what needs to be updated, and how to do it much easier for a localizer. With some interns working this summer on UI efforts, we’ll have even more to play with.

    3) Projects like Pootle (Verbatim), Mozilla Translator, Silme (again), Narro, and more provide for easier user interfaces for localization. Perhaps Silme can provide support to these projects in some way? Not sure due to programming differences, but we’re open to think about it.

    4) A Mozilla commitment to open-mindedness when it comes to further innovation, tying all of the above points together.

    The ease with which we do localization will improve as the contribution to any of the points above increases.

    In the end, l20n (aka all of the above) will make localization way better than it is right now.

    Does that make sense?

    Thank you so much for your thoughts…really helpful to have the debate!

  4. Re tools, I think that most translation tools will fail, which is likely the reason why Dwayne has problems envisioning user experience for localizers being thrown at l20n. A good deal of that problem needs to be shoveled from the localizer looking at “how do I do programmski Firefox” to the tool author looking at “how do I expose grammar constructs of languages, when needed”. Not a simple problem to solve.

    For the actual localizer, as long as you’re plain translating simple strings, things shouldn’t look more complex than DTDs, whatever tools make out of that.

    Re Jorge, interesting challenge. I don’t have an answer to that. I’d probably need to know more about the possible “rules”. I’m slightly scared of a bag of worms, but we’d see. I wonder how much that’s related to noun classes directly, see the short blurb on https://wiki.mozilla.org/L20n:Issues. If not, I’d be thankful if you could add your thoughts to that page.

  5. Jorge

    It’s somewhat related, but I think that the rules are mostly applied depending on word termination. I added a link in the Issues page: http://en.wikipedia.org/wiki/Genitive_case

  6. takeshi

    How do you think about introducing application logic into lol file?
    In theory l20n has the ability to make any changes in one label according to numbers or other situation. But this may introduce inherent complexity to lol files.
    I’m considering this bug for example: https://bugzilla.mozilla.org/show_bug.cgi?id=473706

    Furthermore en-US still has some complexity. It will be nice that the reference implementation is simple enough for localizers to understand what they should translate and what is not necessary to be translated.

    Current l20n divides the application into three layers:
    Application / L20n library / lol file
    while lol file will contain three more layers:
    - language logic
    - word dictionary
    - strings for the application
    Logic (e.g. plural rule) establishs the contents of the dictionary and one word in the dictionary is comsumed in many strings.
    It would be great if each layer in lol file is clearly separated though I don’t know how to do that.

  7. I don’t see how word dictionary would be part of the lol files, really.

    As for language logic, that shouldn’t really be in the lol files either, but in external files to be shared by editing applications. The “symptoms” of that show in the lol file, but the actual language logic like which noun classes a language would have should be in external files.

    I need to comment in bug 473706 again, too.

  8. @takeshi: thanks for this comment.

    I find it very valuable because it presents L20n concept from another angle.

    In cases such as bug 473706, Entity will receive given number of variables that may influence it, but it will be up to the localizer how he will use them. In result I don’t expect any complexity increase. I would even expect decrease do to ability to customize the logic.

    I’m not sure if we can and/or want to separate layers… I’m afraid that this would raise complexity while the gain from readability would not be as high as expected due to small size of each file.

  9. @Axel: I think my concerns are really centred around complexity. I’m asking the question “is this complexity worth it?” You raise a good points about interfaces both the fact that a) you are not sure how they should work and b) that they could hide this complexity.

    I am stumped in terms of how to present this to a localiser and that concerns me greatly. If we are unable to present this to a localiser then that has a serious negative impact on localisation of a product that uses l20n.

    I don’t think we can use the excuse that simple strings are unaffected. This framework is designed for strings that are not simple, simple strings just aren’t part of that equation. Using that argument I could say “Why bother, Polish is only affected in 5% of the case so its not worth the effort”. I’m not saying that though :) What I am saying is that complexity cannot be judged by how easy the other stuff is or isn’t it needs to be judged on this intervention alone.

    So I did the sums:
    + Firefox 3.1b3 – 5401 strings
    + &brand – 107 strings (I use these are examples of declention counts, they’re probably higher)
    + (s) – 11 strings (plurals although in many cases not and I’m sure I’ve missed a few that should be plural)

    Assuming 1 minute per string then we have Firefox in 3.75 days

    Only 2% of the strings are l20n strings in my calculations. They take 2 hours to localise. In the current state of l20n I would calculate that they will take much longer with this added complexity. Lets say 5 minutes per string so it now takes 10 hours to localise. We’ve added 1 whole day of complexity to the localiser, not to mention added complexity for review, for QA and for bug fixing.

    Again my question is this, is this complexity worth it and/or can we hide this complexity.

    I would argue that l20n MUST think about how to present this data visually impacts localisers, the framworks usefullness and wider adoption in tools. I hope I’ve made a good argument about why this can’t ignore presentation.

    I’m happy to try figure it out with you.

  10. 3.75 days will confuse anyone. That should be 11.25 working days of 8 hours each.

  11. @seth: I think reading my response to Axel should give you an idea of where my concerns lie at a technical level.

    I’m excited that this effort works towards trying to solve these issues for real languages. But my questions remain mostly unanswered 1) At what cost? 2) Are we solving problems that we ourselves created e.g. &brandShortName;.

    Some problems aren’t worth solving when weighed against the costs and some can be solved by changing some current practises. Its within that context that we should be evaluating any interventions.

    PS: How about getting some interns to see if Silme can be used as a backend to Virtaal and Pootle?

  12. Vitaly Fedrushkov (SnowyOwl)

    Honestly, looking at wiki ‘last modified’ dates, L20n seems abandoned. I am working on Bugzilla l10n. When trying to move Bugzilla to Maketext (which is standard Perl l10n framework) I was pointed to L20n multiple times, as possible solution for our problems.

    Being a Web application, Bugzilla is rather verbose, compared to desktop apps. OTOH high customizability is required: basically users want to rename most objects (Bug, product, component, platform, etc) according to their reality. Not so hard to achieve in English, but leads to complications in many languages, due to various long distance dependencies.

    I will try to expand https://wiki.mozilla.org/L20n:Issues.

  13. Hi Vitaly,

    Are you driving l10n for Bugzilla? Would love to learn more about that.

    WRT l20n, it’s definitely not dead. So, please do add your comments to L20n:Issues.

    Thanks!

    Seth

  14. Bugzilla l10n team page is https://wiki.mozilla.org/Bugzilla:L10N

    Bugzilla is a web application written in Perl and Template Toolkit (http://template-toolkit.org/). Existing l10n architecture assumes creation and maintenance of translated template set. However, templates aren’t very good in separation of logic and natural language. So each translation maintainer has to merge many changes not involving texts (see discussion at http://markmail.org/message/r3mgtn3mecsoeeij). Now we have not so much successful and well-maintained localizations.

    Right solution would be moving to Maketext (https://wiki.mozilla.org/Bugzilla:L10n:Maketext#References). Maketext is already immune to some problems leading to L20n invention. In particular, Perl is not a compiled language and supports dynamic code well. If implemented, Maketext support would allow for well-known message catalog formats, tools and workflow — and hopefully increase Buzgilla localizability.

    Goals are (a) reduce localizers’ maintenance workload and (b) eventually get more translations with tools like Launchpad Rosetta or Transifex or Narro or Pootle…

    However, move to Maketext is not easy, current implementation sometimes make template syntax ugly beyond repair: https://bugzilla.mozilla.org/attachment.cgi?id=356254&action=diff#bugzilla-tip/template/en/default/account/profile-activity.html.tmpl_sec3

    Future challenges include: (a) customizable terms, which involves static message catalogs preprocessing, (b) localizable database values, populating l10n lexicon from message catalog files *and* database at the same time.