• DTD limitations with the gender of translated words

    January 29th, 2009 by seth bindernagel with 4 comments »

    Yesterday, I wrote about the complexity that many localizers face when translating Firefox.

    Here is another example.

    Difficulty #2: Gender of words

    Remember from yesterday that we can have a DTD file in the en-US version of Firefox like

    <!ENTITY brandShortName “Firefox”>

    where every time the variable “brandShortName” appears in the code, Firefox displays the translated string that is shown in quotes.

    What if a language used a different gender for the same word, causing that word to slightly change given different contexts in the user interface? Let’s look at this example, using Polish again as our language of translation:

    <!ENTITY willCheck “&brandShortName will check links”>

    In Polish, the localizer has to translate the entity into the following:

    <!ENTITY willCheck “&brandShortName będzie sprawdzał(a) odnośniki“>

    where sprawdzał is the masculine version of the word and sprawdzała is the feminine version of the word.  See how that can be problematic?  This isn’t how Poles speak or write naturally, using a parenthetical “a” to account for all possible genders in one sentence.  In context, the proper gender version should be used.  But, the localizer has to acknowledge both endings with the (a).  Alternatively, the localizer could pretend that the word is simply masculine gender, which can be obviously sensitive depending on what word is being written, who is reading it, and what alternate meaning that word might take on with the wrong ending.  Polish locaizers made this change and it’s serviceable.

    As we expand into new areas, where languages can be extremely different from English, we’ll need to think about a better way to do this.  In the next post, one more example and then some more on localization.  I’ll conclude all this with a possible solution.

    (By the way, I don’t speak Polish.  I just happen to work next to a Polish guy every day.  Thanks, Gandalf, for providing me some examples to round out these posts.)

  • Mozilla DTD files, caveat emptor

    January 28th, 2009 by seth bindernagel with 15 comments »

    If you’ve had the opportunity to localize Mozilla, then you have become very familiar with DTDs and the complexities that localizers face when translating a program like Firefox.  I thought I would use a few blog posts to describe some of these challenges, leading up to the next generation of localization at Mozilla — L20n. [1]

    Difficulty # 1:  Declensions of nouns, pronouns, and adjectives and platform-specific word usage

    Do you mind if I rewind us to high school Latin class where I am sure you remember repeating all the declensions of the various forms of nouns, pronouns and adjectives.  In Latin, the six declensions have different number and gender endings (i.e. singular/plural and male/female/neuter).  It turns out that Latin is not the only language that does this.  In fact, Mozilla ships Firefox and Thunderbird in many languages that have similar, if not much higher, complexities with declensions.

    Here is how it specifically relates to Mozilla’s DTDs.  Take the following example:

    So you want to be a localizer?  In the en-US source, we identify our DTDs with the markup declaration “!ENTITY”.  If you see “!ENTITY” in the code, then you know there is something that needs to be translated.  Below, you can see that there is a variable called brandShortName with the string ” Firefox”.

    <!ENTITY brandShortName “Firefox”>

    Every time brandShortName appears in the code, the string “Firefox” (or the translated string provided by the localizer) will be presented in the user interface to someone using Firefox.  But, what if a language has several different declensions of the word Firefox, like Latin, that could be used in different grammatical structures?

    In Polish, for instance, Firefox could be written as Firefox, Firefoksa, Firefoksowi, Firefoksem, etc. depending on how our brand name is used in context.  But, there really is no way to provide multiple words for Firefox in the setup you see above.  brandShortName will always have the value in the string, but that string does not allow localizers to enter multiple possibilities.  The localizer gets one string, effectively one chance, to make the translation work.

    Now, let’s say that each operating system uses a different label to peform a specific function.  Let’s use Polish again as an example.  In the example below, you see can see different entities for Mac (hidemac.label), Windows (hidewin.label), and Linux (hidelin.label)

    <!ENTITY brandShortName “Firefox”>
    <!ENTITY hidemac.label “Hide &brandShortName;”>
    <!ENTITY hidewin.label “Hide – &brandShortName;”>
    <!ENTITY hidelin.label “Hide: &brandShortName;”>

    Can you see above how the Mac label will read “Hide Firefox”, the Windows label will read “Hide – Firefox” and the Linux label will read “Hide: Firefox”?  Seems to work just fine in English.  But, in Polish, the word for Hide is ukryj, which is 2nd person, singular and requires Firefox to be spelled Firefoksa if you want to be grammatically correct.  Here is how we have to localize in Polish:

    <!ENTITY hidemac.label “Ukryj program &brandShortName;”>
    <!ENTITY hidewin.label “Ukryj program – &brandShortName;”>
    <!ENTITY hidelin.label “Ukryj program: &brandShortName;”>

    Localizers have to create a new phrase, “Hide program: Firefox” instead of what seems more natural: “Hide Firefox”.  No one would say “Ukryj program Firefox”.  It sounds robotic and weird or even monsterish.  Polish speakers say “Ukryj Firefoksa”.  But, remember brandShortName can only be “Firefox”.  You can imagine that this happens all the time in Polish.

    Now, multiply that by 60+ localizations whose grammatical structures are different from English, across three platforms, and you get a sense of how difficult this gets.  DTDs have other limits that I’ll blog about with more examples.  Before we’re finished, I’ll get to the better way to do this that Axel and others have thought long and hard about.

    [1]  L20n is a term I would hear Axel mention in the past and I’ve spent some time learning what this actually means.  I am hoping guys like Axel and Gandalf will comment on these blog posts to add to the conversation.

  • A six month report from Translate.org.za

    November 24th, 2008 by seth bindernagel with Comments Off

    You may remember that Mozilla made a grant to the team at Translate.org.za this past summer to help improve the translation tools that many of our localizers use to localize Firefox.  One of the stipulations of that grant asked Translate to provide a mid-year report summarizing their progress.  Many thanks to Friedel (the lead developer at the Translate Toolkit) who submitted it to me today.

    Highlights include the following:

    • Integrating with Mozilla’s code repository system, Mercurial
    • Launching their offline editor Virtaal, which will allow localizers to work on translations when they are unable to access the Web
    • Merging Verbatim work by clouserw and dschafer into their trunk.  (Wil wrote a very thoughtful piece about the decisions Wil and Mozilla made before choosing to hack on Pootle and how it has gone since then.)
    • Migrating to Django, a new web platform for Pootle that should make developer contributions in the future a bit easier

    We hope to have two projects integrated into Verbatim by the end of this quarter so localizers can use the tool to translate the UI for both AMO (addons.mozilla.org) and SUMO (support.mozilla.com).  This will happen due to the great work by Wil Clouser and the guys at Translate Toolkit.

  • Vietnamese for FF 3.1?

    November 6th, 2008 by seth bindernagel with 1 comment »

    Today, we filed the release tracker bug for the Vietnamese localization to participate in Firefox 3.1.  If you look at that bug, you can see all the other bugs we have filed to get this localization ready for release (something I just blogged about).

    The Vietnamese team has been incredibly motivated and responsive in working toward their goal of participating in FF 3.1.  But, this is also a great example of teamwork and assistance from the l10n-drivers.  Gandalf and Gen have been closely working to finalize a localization for Vietnamese.  It’s not done yet, but we’re much closer.

    The story behind it is very interesting.  Sometimes, in our world of localization, it happens that two or more efforts are started on a translation and those teams surface at different times before the final localization is ready.  In most cases, the teams start working together to finish the work.  You can imagine that this is not the easiest undertaking.  A lot of pride and time goes into the individual work.  When translations surfaces from multiple teams, the l10n-drivers works with all the individuals to figure out what is the best next step.  We strive to serve as the most objective intermediary and find an agreeable solution.  In this case, we had two translations from two teams from different regions in Vietnam.  Naturally, some differences in their work arose.  The teams presented their work separately, both with good efforts that were nearly complete.  At that point, we had to come together to decide what was best.  We even enlisted the help of a native, 3rd-party Vietnamese speaker to help evaluate.  But, the teams moved more quickly than we could and consolidated efforts to one.  Along the way, Gandalf put in a lot of effort in communicating with them to help get to this solution.  Also, with Silme, Gandalf provided technical assistance to help reconcile any differences.  Many thanks to Jasper and Hung from the Vietnamese teams.  Now, we’ll work on all the other aspecs necessary to localize, like the local web services.  We are getting close and hoping to get them into the release cycle for Firefox 3.1.

    In our Firefox l10n pipeline, with no particular order:  Vietnamese, Kazakh, Mexican Spanish, Bengali (Bengladesh), and Bosnian.  By “in the pipeline”, I mean we have had some recent activity with the localization teams and are working in some way to get them closer to shipping.

  • Changes to the l10n-build system

    September 30th, 2008 by seth bindernagel with Comments Off

    John O’Duinn, Armen Gasparnian, and the rest of Mozilla’s build team should be very proud of their latest accomplishment.  I am rechanneling a bit from joduinn who will blog in more detail later.  But, this morning, they’ve made the l10n-build system capable of doing multiple repacks at one time.  John also tells me that the “nightly l10n-repacks will be generated the same way as the release bits”.

    Mostly, it means we have allocated more resources to our l10n build infrastructure with less custom setup to maintain.  We have several identical machines working together to repack l10n builds, while in the past we repacked them one build at a time in one giant pack…see where we are going here?  :)

    Ultimately, this will allow us to better service localizers with build resources.  We can do things like “one last repack” for those localizers who might have that need.  In the past, we lined up all builds in a specific order and started the process as one big batch.  If your build was in there, well, you’d have to wait until next time if something happened.  Now, we have a pool of identical machines working in unison to service the l10n-build queue, grabbing the next build to be done as fast as the machines can.  And, everything will happen much, much faster (perhaps 7x), since we have multiple machines working for us.

    For the time being, the “new builds” are in the usual nightly directory here:

    ftp://ftp.mozilla.org/pub/firefox/nightly/latest-mozilla1.9.0-l10n/

    If you want to view “the old way”, this URL takes you to the builds that are generated the traditional way:

    ftp://ftp.mozilla.org/pub/firefox/nightly/old-l10n/

    If you see any problems with the new nightlies, before you file a bug, can you check to see if the same problem occurs with the nightlies that are created in the traditional way?  Functionally, these should be identical, but it will be helpful if we can see if the problem is in the new system or in both.

    This post from Armen explains more. Please read.

    John O’Duinn will post later today or tomorrow with more details.  We are going to run these two systems in parallel for one week.  If there are no complaints, we will shut down the traditional system and only use the new system.  Please send along any questions if you have them.