Storage on the Mozilla platform now supports locale-sensitive collations.
SQLite provides a few simple built-in collations: BINARY, NOCASE, and RTRIM, but as the first suggests they all use memcmp and ignore text encoding and the user’s locale. If you want to show respect to your user and the vagaries of her language’s collating conventions, you have to load your entire set of results and then sort it manually. Sweet.
But now you can do it all in SQL. Bug 499990 adds the following collations sensitive to the locale of your user’s application:
locale- Case- and accent-insensitive
locale_case_sensitive- Case-sensitive, accent-insensitive
locale_accent_sensitive- Case-insensitive, accent-sensitive
locale_case_accent_sensitive- Case- and accent-sensitive
That’s everything covered by the platform’s existing collation facilities.
Use them like so:
SELECT * FROM fooflefipples ORDER BY name COLLATE locale ASC;
Locale-sensitive collations are useful for everyone building on the platform, but I added them as part of some ongoing work on async Places APIs in Firefox. We’d like to notify consumers as batches of results load from the database, but that’s not possible if we have to sort the entire set outside the database — which in turn increases Places’s code size for something that should be handled at the platform level, and now it is.
For a nice summary of the importance and difficulty of sorting strings in our multilingual digital world, see the introduction to the Unicode Collation Algorithm specification.
In related Storage news, Curtis is adding a Levenshtein distance function. (Pretty cool, but my patch is way cooler, don’t tell anybody.)
