-
Mozilla DTD files, caveat emptor
If you’ve had the opportunity to localize Mozilla, then you have become very familiar with DTDs and the complexities that localizers face when translating a program like Firefox. I thought I would use a few blog posts to describe some of these challenges, leading up to the next generation of localization at Mozilla — L20n. [1]
Difficulty # 1: Declensions of nouns, pronouns, and adjectives and platform-specific word usage
Do you mind if I rewind us to high school Latin class where I am sure you remember repeating all the declensions of the various forms of nouns, pronouns and adjectives. In Latin, the six declensions have different number and gender endings (i.e. singular/plural and male/female/neuter). It turns out that Latin is not the only language that does this. In fact, Mozilla ships Firefox and Thunderbird in many languages that have similar, if not much higher, complexities with declensions.
Here is how it specifically relates to Mozilla’s DTDs. Take the following example:
So you want to be a localizer? In the en-US source, we identify our DTDs with the markup declaration “!ENTITY”. If you see “!ENTITY” in the code, then you know there is something that needs to be translated. Below, you can see that there is a variable called brandShortName with the string ” Firefox”.
<!ENTITY brandShortName “Firefox”>
Every time brandShortName appears in the code, the string “Firefox” (or the translated string provided by the localizer) will be presented in the user interface to someone using Firefox. But, what if a language has several different declensions of the word Firefox, like Latin, that could be used in different grammatical structures?
In Polish, for instance, Firefox could be written as Firefox, Firefoksa, Firefoksowi, Firefoksem, etc. depending on how our brand name is used in context. But, there really is no way to provide multiple words for Firefox in the setup you see above. brandShortName will always have the value in the string, but that string does not allow localizers to enter multiple possibilities. The localizer gets one string, effectively one chance, to make the translation work.
Now, let’s say that each operating system uses a different label to peform a specific function. Let’s use Polish again as an example. In the example below, you see can see different entities for Mac (hidemac.label), Windows (hidewin.label), and Linux (hidelin.label)
<!ENTITY brandShortName “Firefox”>
<!ENTITY hidemac.label “Hide &brandShortName;”>
<!ENTITY hidewin.label “Hide – &brandShortName;”>
<!ENTITY hidelin.label “Hide: &brandShortName;”>Can you see above how the Mac label will read “Hide Firefox”, the Windows label will read “Hide – Firefox” and the Linux label will read “Hide: Firefox”? Seems to work just fine in English. But, in Polish, the word for Hide is ukryj, which is 2nd person, singular and requires Firefox to be spelled Firefoksa if you want to be grammatically correct. Here is how we have to localize in Polish:
<!ENTITY hidemac.label “Ukryj program &brandShortName;”>
<!ENTITY hidewin.label “Ukryj program – &brandShortName;”>
<!ENTITY hidelin.label “Ukryj program: &brandShortName;”>Localizers have to create a new phrase, “Hide program: Firefox” instead of what seems more natural: “Hide Firefox”. No one would say “Ukryj program Firefox”. It sounds robotic and weird or even monsterish. Polish speakers say “Ukryj Firefoksa”. But, remember brandShortName can only be “Firefox”. You can imagine that this happens all the time in Polish.
Now, multiply that by 60+ localizations whose grammatical structures are different from English, across three platforms, and you get a sense of how difficult this gets. DTDs have other limits that I’ll blog about with more examples. Before we’re finished, I’ll get to the better way to do this that Axel and others have thought long and hard about.
[1] L20n is a term I would hear Axel mention in the past and I’ve spent some time learning what this actually means. I am hoping guys like Axel and Gandalf will comment on these blog posts to add to the conversation.



















