Firefox 3: UTF-8 support in location bar
Friday, May 23rd, 2008There have been a number of posts recently looking at new features of Firefox 3 including the new smart location bar (a.k.a. Awesomebar), the new bookmarks functionality, color profile support, the site identification button, the 3 new themes, to name just a few.
I’d like to take a look at one of the new changes for Firefox 3 - support for UTF-8 multi-byte uris. To give credit where it is due, this functionality is already available in Internet Explorer 7, in Safari 3, and in Opera 9. However, this functionality is slightly different in these browsers (which I will explain further below.)
For those of us who mainly use the Roman-language us-ascii web, you may not notice one of big changes for Firefox 3: UTF-8 multi-byte support in the location bar. This is a very large usability win because previously non-Roman ascii language uris were unreadable in Firefox 2. In Firefox 3, they are now human readable.
As an extreme example, here is the Japanese wikipedia page for the place in Japan that has the longest name, 愛知県海部郡飛島村大字飛島新田字竹之郷ヨタレ南ノ割。
For those of you who study Japanese, you would pronounce it like this: 「あいちけんあまぐんとびしまむらおおあざとびしましんでんあざたけのごうよたれみなみのわり。」
In Firefox 2 where the location bar would not display the Japanese multi-byte characters, the encoded uri is 254 (!!!) characters.
In Firefox 3, where the location bar supports UTF-8, the uri is 54 characters (and is readable within an average laptop browser window.)
http://ja.wikipedia.org/wiki/愛知県海部郡飛島村大字飛島新田字竹之郷ヨタレ南ノ割
Human readability and a shorter uri together make this quite an important feature, especially for non-Roman ascii language parts of the web (which I think are the parts of the web that may be growing the fastest recently.)
Two other examples to show the extremes of multi-byte uris in ascii text:
The Welsh town of Llanfairpwllgwyngyllgogerychwyrndrobwllllantysiliogogogoch is 58 characters in length.
In Wikipedia Japanese, it becomes a 389 character encoded uri in Firefox 2.
It is a mere 69 characters if we can use a browser that supports encoded multi-byte characters in the uri.
http://ja.wikipedia.org/wiki/ランヴァイル・プルグウィンギル・ゴゲリフウィルンドロブル・ランティシリオゴゴゴホ
Here is a Japanese wikipedia page that has information about a portion of the US-Japan Status of Forces Agreement. It is a 704 character encoded uri in Firefox 2.
It is 104 characters using Japanese in the uri:
These are extreme examples to show what happens when a multi-byte uri becomes encoded.
Here is an enlarged image of Firefox 2 of a uri from the Japanese volunteer translated Mozilla Developer Center documentation on Vine Linux. (Click on the image to see it larger.)
You can see that the uri after “MDC:” is unreadable encoded text. (Click on the image to see it larger.)
In Firefox 3 it looks like this: (Click on the image to see it larger.)
It’s a tad blurry but I hope you can see that the uri says “MDC:日本語版” which means ‘Japanese language.’
Here are 3 screenshots of Firefox 2 in Vista, Mac OS, and Vine Linux, as well as 3 shots of Firefox 3 in Vista, Mac OS, and Ubuntu to show you the differences. You can click on the images to see larger images if you would prefer that.
Firefox 2 on Vista (non-human readable because of encoded uri; click on image to view larger)
Firefox 2 on Mac OS (non-human readable because of encoded uri; click on image to view larger)
Firefox 2 on Vine Linux (non-human readable because of encoded uri; click on image to view larger)
Firefox 3 on Vista (human readable with decoded uri; click on image to view larger)
Firefox 3 on Mac OS (human readable with decoded uri; click on image to view larger)
Firefox 3 on Ubuntu 8.04 (human readable with decoded uri; click on image to view larger)
Dynamis helped me make the screenshots in Japanese just as an example (as that’s the non-Roman ascii language that we are most comfortable with) but if you have examples from your non-Roman ascii language, please feel free to post Firefox 3 screenshots to the web and leave uris in the comments so people can see how this might work in another non-Roman ascii multi-byte character set.
With respect to how browsers handle this functionality differently, Firefox 3, Opera 9 and Safari 3 all automatically decode uris in the location bar so that they are human-readable. IE7 has support for UTF-8 multi-byte uris but will not automatically decode them in the location bar.
There are no specifications anywhere for this browser behavior as far as I know (please correct me if I am wrong.)
Finally, note that pages that are not UTF-8 encoded will not be decoded properly in Firefox 3 if the uri is multi-byte.
It is a small feature, but for those of us who spend time in the multi-byte Internets, it is a very, very important feature for both readability and usability.
Thank you to dynamis and jdaggett for the review and help.
Some other posts about new features in Firefox 3










