DTD limitations with the gender of translated words
Yesterday, I wrote about the complexity that many localizers face when translating Firefox.
Here is another example.
Difficulty #2: Gender of words
Remember from yesterday that we can have a DTD file in the en-US version of Firefox like
<!ENTITY brandShortName “Firefox”>
where every time the variable “brandShortName” appears in the code, Firefox displays the translated string that is shown in quotes.
What if a language used a different gender for the same word, causing that word to slightly change given different contexts in the user interface? Let’s look at this example, using Polish again as our language of translation:
<!ENTITY willCheck “&brandShortName will check links”>
In Polish, the localizer has to translate the entity into the following:
<!ENTITY willCheck “&brandShortName będzie sprawdzał(a) odnośniki“>
where sprawdzał is the masculine version of the word and sprawdzała is the feminine version of the word. See how that can be problematic? This isn’t how Poles speak or write naturally, using a parenthetical “a” to account for all possible genders in one sentence. In context, the proper gender version should be used. But, the localizer has to acknowledge both endings with the (a). Alternatively, the localizer could pretend that the word is simply masculine gender, which can be obviously sensitive depending on what word is being written, who is reading it, and what alternate meaning that word might take on with the wrong ending. Polish locaizers made this change and it’s serviceable.
As we expand into new areas, where languages can be extremely different from English, we’ll need to think about a better way to do this. In the next post, one more example and then some more on localization. I’ll conclude all this with a possible solution.
(By the way, I don’t speak Polish. I just happen to work next to a Polish guy every day. Thanks, Gandalf, for providing me some examples to round out these posts.)




















I’m curious to know why it wouldn’t be possible to expand the entities to have:
&brandShortName-m;
&brandShortName-f;
In which case you could write:
“&brandShortName-m; będzie sprawdzał odnośniki”
Or
“&brandShortName-f; będzie sprawdzała odnośniki”
@Seth: by the way you forget to include the semi-colon in &brancShortName so your example localisation of Firefox just crashed. Sorry
@ Dwayne: Great suggestions. Have you ever had that conversation with a developer? And, thanks for pointing out another pain point with localization.
The reason why we’re not playing around with the entity substitution for XML is that that requires hacking up some very custom and very low-level hacks into expat. Sadly, it doesn’t expose any API anywhere close to where we would need it, at least that’s my recollection from digging into the source there.
Which is one reason why I didn’t try to work on the concept of XML DTDs to do string replacement in l20n for XML applications.
@ Dwayne: nouns declension and/or different genders in different languages
Even without declension this is more tricky; eg when you don’t want to use brackets, you can add “keyword” that can force gender, like in real live example (commited today):
en:$BrandShortName will be set as your default mail application.
commited: Program $BrandShortName zostanie ustawiony jako domyślny klient poczty.
should be: $BrandShortName zostanie ustawiony jako domyślny program pocztowy.
I can’t translate “default mail app” as “domyślny program pocztowy” since I used the word “program” on the start of the sentence already so, I’m forced to use quite strange wording in other places (default app) if the original sentence also contains my gender setting keyword or use brackets (even worse).