Daniel Witte

August 24, 2010

User Agent string changes coming in Firefox 4

Filed under: Uncategorized — dwitte @ 2:23 pm

Edit 9/9/2010: This information has been superseded by an updated post here. Please refer there for final information on the changes for Firefox 4.

With a title like that, you just know this is going to be fun. (No, seriously.)

The user agent string is one of those wonderfully eclectic things, a balance of modernity and antiquity. Except mostly skewed toward antiquity. It’s grown, piece by piece, over the years; because everyone has their own special way of parsing it, it’s a notoriously sensitive beast. Adding to it is relatively simple, but removing or rearranging bits is not.

If you’re a web developer and you rely on bits in the UA string, this post is definitely for you. As it so happens, the UA string for Internet Explorer 9 has undergone some revision, and Microsoft has recently announced the string for IE Mobile. This makes the time ripe for a revision so that web developers can make the necessary changes to sniffer code all at once.

Here are the changes we’ve made so far for Firefox 4, on the Big Three platforms (for a complete reference, see https://developer.mozilla.org/En/Firefox_User_Agent_String_Reference):

Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/yyyymmdd Firefox/4.0.1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:2.0.1) Gecko/yyyymmdd Firefox/4.0.1
Mozilla/5.0 (X11; Linux i686; rv:2.0.1) Gecko/yyyymmdd Firefox/4.0.1

What’s changed, you say? Here’s what:

1) The "Windows; " prefix is gone from the (surprise!) Windows-specific string.

2) The locale (e.g. "en-US; ") is gone. The locale of the browser is not always the same as the locale the user prefers to view content in — the HTTP Accept header is the recommended source of this information.

3) The "U; " is gone. Back in the day, this was used to denote browsers with strong encryption from those without. Nowadays, no browser ships with weak encryption. This means that if you’re sniffing for "U; ", you should stop doing so, or sniff for the lack of weak encryption ("I; " or "N; ").

4) Testing builds of Firefox (Minefield nightlies and prerelease builds) will now identify themselves as "Firefox/x.y.z", just like release builds.

We may also remove the "Macintosh; " from the Mac UA string, and the way a Linux x86 32-bit browser build running on an x86_64 processor is identified; if we do, you’ll hear about it here.

It’s also worth noting why we didn’t change some things. The "X11" part of the Linux string may appear redundant, but it’s actually not: desktop machines are almost exclusively running X11, but Android phones are not. For reasons like this, the various platform-specific parts of the string are important to know. For instance, there are different tokens for Windows 64 on x64 or IA64, or WoW64 (a 32-bit browser on a 64-bit Windows); PPC or Intel on Mac OS X; and the various environments and architectures for Linux. A list of the common variations can be found at the link above. There’s also a text file here for your testing convenience.

There are some other important changes relating to how the UA string can be modified by external programs, Firefox addons, and users themselves. I’ll detail these changes in an upcoming post, but suffice to say: the days of horrendously long and arcane strings are over.

Now, if you’re a web developer, take note! These changes will be in Firefox 4 Beta 5 (soon to be released), so if you want to be ahead of the game, you’re welcome to test against it. As noted above, for a complete list of UA strings to test against in all their glorious variations, see the link above or the text file. Happy sniffing!

March 12, 2010

Extension authors, browser hackers, meet js-ctypes

Filed under: Uncategorized — dwitte @ 7:10 pm

What, you may ask, is js-ctypes? Let’s say you’re writing a Firefox extension in JavaScript that needs to call into native code. (For example, weave-crypto, which needs to call into the NSS library. Or your extension here, which, say, wants to call into NSPR, libc, or Win32 functions directly.) Currently, you’re limited to either a) using the scriptable XPCOM interfaces provided by libxul, or b) writing and implementing your own XPCOM interfaces and shipping binary code with your extension. If a) is inadequate, you’re stuck with b), which makes shipping an extension much harder — binary code must be built for each supported platform, and rolled up together into your cross-platform xpi.

Thusly, the answer is js-ctypes: it allows JavaScript to call into native C code and manipulate data types, without using XPCOM and without compiling a line of code. This means you don’t need to define XPCOM interfaces, and can use shared libraries like libc directly. As a side benefit, much of the type conversion overhead of XPConnect is avoided, so that execution can be faster. (I’ll be benchmarking in a later post.) This will ship with Gecko 1.9.3, which will be underpinning Firefox 4.

But how, you say? To the examples (tested on 32-bit x86 Linux; sans cross-platform safety):

1. Opening a library.

// First, import the ctypes module.
Components.utils.import("resource://gre/modules/ctypes.jsm");

// Open libc.
let library = ctypes.open("libc.so.6");

// Declare the 'fopen' function, which has a C prototype of:
//   FILE* fopen(const char* name, const char* mode);
let fopen = library.declare("fopen",                        // symbol name
                            ctypes.default_abi,             // cdecl calling convention
                            ctypes.StructType("FILE").ptr,  // return type (FILE*)
                            ctypes.char.ptr,                // first arg (const char*)
                            ctypes.char.ptr);               // second arg (const char*)

// Call the function, and get a FILE* pointer object back.
let file = fopen("hello world.txt", "w");

// What is 'file'?
file.toString();
  ===> "ctypes.StructType("FILE").ptr(ctypes.UInt64("0x9781b38"))"  // pointer value

// ... write data to file here ...

2. Declaring and using structs and arrays.

// Declare the 'hostent' struct, which contains five fields, each with their own
// types. This corresponds to the C declaration:
//   struct hostent {
//     char* h_name;       // name of host
//     char** h_aliases;   // array of strings representing aliases of the host
//     int h_addrtype;     // whether the IP address is IPv4 or IPv6
//     int h_length;       // length, in bytes, of the IP address
//     char** h_addr_list; // array of IP addresses from name server
//   };
let hostent = ctypes.StructType("hostent",
                                [{ h_name      : ctypes.char.ptr                 },
                                 { h_aliases   : ctypes.char.ptr.ptr             },
                                 { h_addrtype  : ctypes.int                      },
                                 { h_length    : ctypes.int                      },
                                 { h_addr_list : ctypes.uint8_t.array(4).ptr.ptr }]);

// Declare the 'gethostbyname' function, which has a C prototype of:
//   struct hostent* gethostbyname(const char* name);
let gethostbyname = library.declare("gethostbyname",
                                    ctypes.default_abi,
                                    hostent.ptr,         // use our 'hostent' type
                                    ctypes.char.ptr);

// Ask our function to resolve a hostname.
let google = gethostbyname("mail.google.com");

// Dereference the pointer to 'hostent' struct, access the 'h_name' field, and
// convert it to a JS string. This is roughly equivalent to the C expression:
//   printf("%s", google->h_name);
google.contents.h_name.readString();
  ===> "googlemail.l.google.com"

// Dereference the 'h_addr_list' field twice to get the first element in the
// array, which is an array of 4 bytes representing the IPv4 address of the host.
// Roughly equivalent C:
//   printf("%u.%u.%u.%u", (int) h_addr_list[0][0], (int) h_addr_list[0][1],
//          (int) h_addr_list[0][2], (int) h_addr_list[0][3]);
google.contents.h_addr_list.contents.contents.toString();
  ===> "ctypes.uint8_t.array(4)([74, 125, 19, 17])"  // 74.125.19.17

3. Creating C function pointers for JS functions.

// Declare the type of a comparator function that takes two pointers to elements,
// and returns:
//   -1 if i < j;
//    0 if i == j;
//    1 if i > j.
// Equivalent C:
//   typedef int (comparator_t*)(const int8_t* i, const int8_t* j);
let comparator_t = ctypes.FunctionType(ctypes.default_abi, ctypes.int,
                                       ctypes.int8_t.ptr, ctypes.int8_t.ptr).ptr;

// What's the C type of 'comparator_t'?
comparator_t.name;
  ===> "int (*)(int8_t*,int8_t*)"

// Declare the 'qsort' function which takes an array of elements and a comparator
// function, and sorts the array.
//   void qsort(void* array, size_t length, size_t elemsize, comparator_t comp);
let qsort = library.declare("qsort", ctypes.default_abi, ctypes.void_t,
              ctypes.voidptr_t, ctypes.size_t, ctypes.size_t, comparator_t);

// Implement a JS function that looks just like 'comparator_t' above.
function reverse(i, j) { return j.contents - i.contents; }

// Construct a C function pointer from our JS function.
let reverse_ptr = comparator_t(reverse);

// What's 'reverse_ptr'?
reverse_ptr.toString();
  ===> "ctypes.FunctionType(ctypes.default_abi, ctypes.int, ctypes.int8_t.ptr,
                            ctypes.int8_t.ptr).ptr(ctypes.UInt64("0x81a430cb"))"

// Construct an array of values and call 'qsort'.
let array_t = ctypes.int8_t.array();
let ints = array_t([3, 1, 5, 6, 4, 2]);
qsort(ints.address(), ints.length, array_t.elementType.size, reverse_ptr);

// Voilà!
ints.toString();
  ===> "ctypes.int8_t.array(6)([6, 5, 4, 3, 2, 1])"

So, if you’re an extension author, or writing code for the browser, keep js-ctypes in mind — and let us know how it goes!

Edit 8/7/10: updated syntax for API changes.

March 3, 2010

On the state (and future) of cookies

Filed under: Uncategorized — dwitte @ 1:31 pm

Back in 1994, at a fledgling company called Netscape Communications, a specification was proposed that would fundamentally change the Web: the ability for a web server and a browser to maintain state. Web pages being viewed in a browser could send login information to websites using the standard form controls that we know today, but there was no way for that login state to be associated with an individual browser. Things like shopping carts, online forums, and online banking weren’t possible. The cookie specification changed that. By allowing the web server to store small pieces of information in the browser, this information could then be sent back with every http request, and stateful transactions were born.

Of course, the web back then was far different from the web today, and the cookie standard really hasn’t changed much. The simple protocol that was implemented more than a decade ago was just that – simple, minimal, enough to get the job done. But along came https, and JavaScript, and advertisers that want to track you, and host sites with many unrelated subdomains like GeoCities and Blogspot — and with them came cross-site scripting attacks and severe privacy and security issues. Underneath a login session at your favorite online bank is just a handful of cookies, and if that bank makes a mistake or two in their implementation, those cookies could be sent in the clear — you can guess what happens next.

There have been attempts to improve the protocol. RFC2109 was the first, and somewhat reflects the situation today. But this spec didn’t succeed a vacuum – cookies were already in use, and browsers didn’t wholly adopt the new standard for fear of breaking existing, working sites. The spec became more of an ideology, and servers were instead designed around how browsers actually behaved. Realizing that changing the standard — or rather, having browsers and servers implement the standard — was a losing battle, RFC2965 was produced. This introduced new headers, Set-Cookie2 and Cookie2, that had different semantics and solved some (then new) problems – it added more trust to cookies, so that servers could have more confidence that what they received from a browser actually originated from them and not an attacker. This is achieved by sending back metadata with each cookie, detailing how the cookie was set. Unfortunately, interest in the new standard wasn’t great, and most browsers today don’t implement it. (Among the big five, Opera is the sole exception.) But the problem of cookie trust and integrity remains.

Thus was formed the http-state working group in 2009, under the auspices of the IETF. The charter of the group is, firstly, to produce a specification that accurately reflects how cookies are implemented today. This is necessarily a big task, because such a specification must account for the subtle differences between browser implementations. This will result in two separate documents — one describing how clients should behave in order to match other clients (concensus behavior), and one describing how servers should behave in order to interoperate with those clients (conservative behavior). The former will be much more detailed — due to the presence of numerous edge cases — than the latter. Work towards producing a draft is pretty far along. As an added benefit, the extensive testing of the big five browsers required to produce this spec has resulted in several fixes to Firefox, in order to bring it in line with the consensus behavior. Adam Barth deserves a big callout for the awesome effort and energy he’s put into this project — without him this spec wouldn’t exist.

The http-state working group will be congregating at the 77th IETF meeting in Anaheim, California, March 21-26. Representatives from at least some of the major browser vendors — and of course other server and client implementors — will be there. I’ll be going along in my capacity as the Firefox cookie module owner. Which brings me to the second (perhaps informal) purpose of the group, or at least of many of its members — to bounce around ideas for a modern cookie standard; something that does not inherit the integrity problems and lack of scaling that current implementations do. While there are new web standards that provide state management mechanisms (such as DOM storage), nothing beats the simplicity, ubiquity, and public awareness of cookies. Thus it is a noble goal to make things right, and something that http-state holds promise to do.

July 6, 2009

Joining Mozilla

Filed under: Uncategorized — dwitte @ 9:07 pm

… and my first post!!

After a good long while in school (essentially my entire life to date, in fact), I complete a Ph.D at Stanford University and join Mozilla Corporation. I’ve enjoyed contributing to Mozilla for many years, in which time I’ve helped improve performance, clean up code, add features, improve privacy, and (probably most significantly) become owner of the cookie module. I hope to continue these activities, with one notable addition: my primary focus at Mozilla will now be static analysis, working to improve code quality and performance for the entire Mozilla platform. I will be working mostly with Taras Glek, Benjamin Smedberg, and David Mandelin among others, and I’ll be blogging soon about some of the problems we hope to solve.

There’s a lot to be done, and I’m excited to finally be able to devote some serious time to it!

Powered by WordPress