Feed on
Posts
Comments

Firefox Telemetry

Benchmarks Suck

Mozilla has traditionally relied on [Talos, Sunspider, Kraken, etc] benchmarks to optimize Firefox. Unfortunately there are two problems with benchmarks: a) it is hard to write good benchmarks (see all of the complaints about Sunspider) b) the most perfect synthetic benchmarks do not completely correspond to actual user usage. Firefox with a well-used profile, anti-viral software, well-aged Windows and 30 addons will not perform the same as it does in our clean benchmarking environment.

For my team this became obvious in Firefox 4 once we started recording Firefox startup times. Turned out that it is easier to work on fixing startup performance than make our synthetic test closely reflect real world startup speed.

Telemetry

There is only one solution to this problem: develop telemetry infrastructure to measure Firefox performance in the wild. Beginning with version 6, Firefox will ask users to opt-in to sending anonymous usage statistics about performance, user interface feature usage, memory usage, and responsiveness to Mozilla. This information will help us improve future versions of Firefox to better fit actual usage patterns.

This functionality is already present in our major competitors. Unlike our competition we do not plan to tag reported data with unique identifiers. This will make it harder for us see how Firefox performance changes over time for particular users (or easily tell whether some users are disproportionately represented due to sending more reports). We take our users’ privacy seriously, so this seems like a reasonable trade off.

Yesterday I landed the reporting part of telemetry (bug 585196). We are still working on UI, official server-side and on updating the privacy policy.

Above screenshot shows some of the data that will be gathered for users that opt-in to telemetry.

Please help us get a headstart on telemetry. In the recent nightlies, go to about:config and set toolkit.telemetry.enabled to true. Once the pref is set, Firefox will send interesting performance data to the telemetry test server. The metrics are very compact and are sent out no more than once a day.

Since there is no UI yet, install my about:telemetry extension and navigate to “about:telemetry” to see the metrics collected.

Until recently our state of the art method for measuring startup was to subtract a timestamp passed via commandline from a new Date() timestamp within a <script> tag. Vlad pioneered this approach, me and others adopted it.

Turns out there are two problems with this approach:

  1. It is cumbersome, especially on Windows where there is no easy way to pass a timestamp via the commandline.
  2. It is wrong. Turns out that Firefox starts loading web pages before the UI is shown. One can’t be sure that the page being loaded is within a visible browser

Our oldest startup benchmark, ts,  has been gathering wrong numbers all along.  This resulted in a class of perverse optimizations that decreased the ts number, but increased the time taken for UI to appear (ie bug 641691). The new tpaint (bug 612190) benchmark should should address this. On my machine measuring pageload vs paint-time results in a 50-100ms difference. See the graph server for more data.

This is why AMO’s complicated method of measuring startup is wrong. Please use our shiny new about:startup extension or if you absolutely want to avoid adding any overhead use getStartupInfo API directly.

Recently the addon team started working towards penalizing addons that penalize our startup. We have solid data shows that startup gets worse as more addons installed so the effort is justified. Justin published this picture to illustrate the problem:

However some technical mistakes were made. Wladimir (AdBlock+ guy!) has been busy exposing them on his blog. Thanks Wladimir! I spent a couple of years understanding Firefox startup, so I really appreciate Wladimir’s remarkable speed/quality in poking holes in AMO approach.

Rating Addon Performance

Wlad’s latest point is regarding how addon impact is measured on warm startup with an empty profile. I agree that addon impact on warm startup with clean profile is going to be different from that of cold startup with a dirty real-world profile. I also agree that it is weird to primarily measure warm startup given that our data clearly indicates that most users are experiencing cold startup.

However startup is an irritating multidimensional problem influenced by a large number of factors. Should we suck it up and let addons kill warm startup (and risk ruining firefox upgrade times, pissing off web developers)? How does one choose a typical dirty profile? Creating a “typical” dirty profile is a tough problem (ie due to privacy, disk fragmentation) and there is no reason to let clean profile performance be degraded.

So an addon performance rating will not be perfect. Time added by addon during warm startup is the easiest starting point and is one of the more deterministic measures (Wlad’s blog says otherwise, but measuring other startup kinds have even more noise). This is no worse than choosing browsers based on their JavaScript benchmark performance.

I think a much awesomer addon performance rating may be obtainable by clever statistics boffins by analyzing our aggregated startup data. Perhaps the Mozilla Metrics will make it a reality, but in the meantime we’ll have to make do with an imperfect approach.

Coming soon: Why is the measurement approach in Jorge’s post is overly complicated and somewhat incorrect.

“Start Faster” Addon

A large proportion of our startup time is spent on loading the Firefox library(.dll, etc) files. This is true on all of our platforms. In my previous post I thought I discovered a way to load the Firefox binaries more efficiently on Windows(bug 627591). Further testing revealed that to not be true in all cases and that the Windows Prefetch service was still killing our startup speed. Microsoft does not seem to provide a way to opt out of the Windows Prefetch ‘service’.

There is a DIY way to opt out of prefetch by gaining Administrator privileges and deleting files out of the prefetch directory. Our is plan is to provide a Windows service to handle Firefox updates (bug 481815). Additionally the service would be able to do useful things like delete prefetch files, defragment Firefox databases(or at least help report on fragmentation levels). This is all tricky stuff.

In the meantime, to test my theory I wrote a test extension. This is a restartless extension that adds a Windows service and a wrapper executable to significantly speed up Firefox startup after rebooting (oh the irony!). After installing the extension, “Faster Firefox” shortcut on the Desktop should result in up to 2x speed up in Firefox startup. This addon is a rough proof of concept to play with while I bake this functionality into Firefox. Comments, improvements are welcome. Note that launching Firefox by shortcuts other “Faster Firefox” is slower while this extension is installed (this will be fixed once preloading functionality is integrated into Firefox properly). This addon does not yet work on XP because I do not yet have an XP enviroment to test/develop in (patches are welcome).

Internet as of late have been obsessing over magically short patches that improve performance _ times(probably as a result of LKML cgroups patch from a few weeks ago). So my work in bug 627591 got picked up in all kinds of news sources(mostly due to @limi’s manlove). Apparently all that internet fame is good for is getting script-kiddies to upload viruses as bugzilla attachments. Dear Internet, please do not interrupt me in the middle of an investigation.

To crux of the optimization lies in trading waiting for random io for fast sequential IO. Turned out that my patch worked great if windows prefetch wasn’t trying to help (ie firefox ran faster without prefetch on my test systems). With prefetch on, the patch was either a smaller win or a downright loss. When I dug in deeper, it turned that the Windows Prefetch helpfully spends 3-6 seconds doing IO before any Firefox code gets to run. It also doesn’t read in a very clever pattern, resulting in a very small speed up for Firefox, but preventing my exciting optimization.

So I curled up into my defeated fetal position and pondered on how would I prevent Windows Prefetch from being so “helpful”. One way would be to install some crapware to cripple prefetch (kidding!), another way is to do the sequential IO in a separate executable(ala run-mozilla.sh on Unix). This way Windows doesn’t try to do insane amounts of IO before my preloading logic gets to run. This seems to work (see wrapper.exe talk in the bug) and has potential to double Firefox startup times. It’s also ugly as sin, but if that’s what it takes…

So now I need more reports to make sure the executable wrapper approach reliably/significantly speeds up cold(post-reboot) startup. Then we can make a decision on how to integrate this into Firefox. But until we have all the data, please don’t jump to conclusions on what will and wont make Firefox 2x faster.

Builtin Startup Measurement

I got used to measuring startup the complicated way (example here). It’s complicated enough that many people prefer to use stopwatches.

Turns out modern operating systems can help applications self-diagnose startup speed. Thanks to landing bug 522375 we now provide an API for measuring startup speed. For example, now I know that xpcshell takes forever to startup on mac

./xpcshell -e 'print(new Date() - Components.classes["@mozilla.org/toolkit/app-startup;1"].getService(Components.interfaces.nsIAppStartup_MOZILLA_2_0).getStartupInfo().process)'

At any point one can now go to the error console and type in

uneval(Components.classes["@mozilla.org/toolkit/app-startup;1"].getService(Components.interfaces.nsIAppStartup_MOZILLA_2_0).getStartupInfo())

or

var si=Components.classes["@mozilla.org/toolkit/app-startup;1"].getService(Components.interfaces.nsIAppStartup_MOZILLA_2_0).getStartupInfo(); si.sessionRestored-si.process

to get various interesting timestamps. So next time you see a startup take a surprising amount of time, you can go and poke around to see where that time was spent. At the moment there are 4 datapoints:

  1. .process – Process creation timestamp. This is cool because this happens before any library code is executed.
  2. .main – XRE_main timestamp. My favourite thing to do is to subtract .process from .main. This demonstrates huge overheads that many application programmers refuse to believe in.
  3. .firstPaint – Timestamp of the first intended paint. This coincides with when the user sees the first sign of life.
  4. .sessionRestore – Timestamp of session restore, ie when the browser becomes useful.

Lots of people helped in getting this feature landed, but two people stand out. Daniel Brooks originally figured how to expose Windows/Linux startup speed in Mozilla. Mike Hommey devised a morally-reprehensible way to get precise startup speed out of the idiotic way that Linux presents it.

Linux Sucks

Mac and Windows expose process creation times via human time. It’s just like any other API with time in it. Linux provides process startup speed in imbecile jiffies-since-boot. That’d be irritating enough, but to piss people off further there is no way to convert that into human time. Clever ps developers resort to calculating jiffies/second by comparing against seconds-since-start in /proc/uptime. Unfortunately that does not even come close to providing anything close to millisecond resolution needed for useful numbers. Mike’s idea of timing startup of another thread/task to get a known jiffy-stamp got us precise-enough numbers (around 10ms resolution with most kernels, see patch for details). At least Linus was kind enough to let mere user-space devs obtain the current tick-rate.

Update:

Linux doesn’t suck anymore, a commenter pointed me to a relatively new(2006) taskstats interface which appears to work sanely like the BSD one.

Mike Hommey made an about:startup extension using this API.

In addition to slow font enumeration, we were suffering from a similar problem: slow plugin enumeration. Just as with fonts, the plugin enumeration code is different on every platform. Unlike the font situation, plugin enumeration is done completely within our code(ie easy to fix).

Plugin enumeration is often triggered by JavaScript code (for example by checking if a Java handler is present). This means that enumeration is a blocking operation that must happen quickly. XPerf made me wonder why so many plugin-like .dll files were being read. This lead me to a fun set of perf fixes.

The Algorithm

  1. Files in plugin directories are listed
  2. Platform-specific IsPluginFile function to determines what files look like plugins(ie np*.dll on Windows).
  3. Code then checks if the files + their timestamps are known by pluginreg.dat. If so, cached info is used and the following steps are skipped
  4. For each library-file that isn’t found in pluginreg.dat, we use platform-specific GetPluginInfo to load the library-file to see if it is indeed a valid plugin (and to see what mimetypes it handles/etc).
  5. Valid plugins are recorded in pluginreg.dat.

This process took up to 3 seconds on a user’s computer. WTF? There were gotchas in almost every step of the way.

  1. Windows directory listing code would request metadata for every bloody file in the directory. Which resulted in an easiest optimization ever: pure code deletion.
  2. IsPluginFile on Windows/Mac sneakily did more than just check the filename. It also checked if the file was loadable, which on Windows loaded the dll and all of the dependencies. Mac code was satisfied with merely doing a little extra IO.
  3. This part was right
  4. #2 was easily fixed by moving file IO here.
  5. Files that failed the check in #4 were doomed to cause extra IO for all of eternity. Scott Greenlay fixed that by recording invalid plugin-like files too.

This was a rare fix that resulted in seconds saved on crapware-loaded computers. Usually I have to count my progress in milliseconds :(

Help Wanted

I have plans for vastly improving Firefox startup, but I need help to get there. If you enjoy beating under-performing code into submission and want to work for Mozilla, please send me your resume(taras at mozilla dot com). Example projects: a better performance testsuite (ie tracking IO, cpu instructions, etc), better infrastructure for profiling addons, optimizing away various CSS/XUL markup, etc. A low-level approach to solving problems is helpful, compiler/linker/kernel hackers are well-suited (but not required) for this.

Imagine a typical Firefox user who starts their Windows computer in order to surf the web. First app they launch is Firefox 4. Turned out that on systems that support hardware-acceleration for 2D graphics, Firefox 4 takes minutes to startup. WTF? XPerf-aided investigation showed that, the Windows font enumeration code causes us to do 30x more disk IO (~300MB) than the rest of Firefox code.

In order to hardware accelerate Firefox, we switched from GDI to using DirectWrite for font stuffs. Apparently, DirectWrite is a wonderful api, but the implementation has some teething issues. DirectWrite opens a connection to the Font Service (and starts it if it isn’t already running), however if service fails to respond DirectWrite proceeds to enumerate all of the system fonts on the client-side. This isn’t cool for multiple reasons: a) it is slow as hell b) it causes Firefox to run out of memory(installing IE9 helps!) sooner.  This means that currently Firefox 4 starts up a lot slower than 3.6. John Daggett is busy working on a workaround by using older GDI APIs to enumerate fonts. Firefox is one of the first popular Windows applications to switch to DirectWrite, so we get to suffer the consequences.

Unfortunately it turns out that using Microsoft GDI APIs to enumerate fonts still causes a significant amount of disk IO (~30-60MB), John plans to fix that next.

How Did We Miss This?

This bug came from a fundamental difference of how developers and users start Firefox. A developer will restart Firefox a dozen times an hour. This means we rarely get to observe true cold startup. Our tests only measure warm startup (because most operating systems make it difficult to test cold startup). Windows is also incredibly slow to develop on, so a lot of us test in a virtual machine to speed things up and avoid rebooting the computer all the time. This also makes observing cold startup hard. Fortunately xperf makes IO much easier to observe. We should deploy xperf on our test infrastructure as soon as possible.

Crapware and Firefox

I completely agree with Asa that having unwanted crap forced upon the user is morally wrong. We should do a better job of undoing this kind of braindamage. In the meantime here is a brief rant on the parasitic underpinnings of crapware.

Until recently, I have been testing Firefox on my own installs of Windows. I had no idea how aggressive bundleware could be. Then I got this piece-of-crap i7 Acer laptop with Windows 7 (and relatively little crapware preinstalled) and tried to use it as my primary machine. Suddenly, I could reproduce a lot more “slow” scenarios. I even went further and tried installing common crapware known as AVG to reproduce more bugs.

Turns out almost every vendor tries to mix in crap into Firefox. Acer, Microsoft Office/Silverlight, Adobe flash/acrobat, Google, AVG, etc all added unwanted functionality to my Firefox. I marveled at all kinds of “helpful” functionality such as the wonderful ability to click on a link in a webpage and have Google chrome install without any warning that the webpage is about to execute a windows program. AVG adds a couple of extensions that make Firefox start up 0.5-4x slower.

So far I noticed 2 vectors of attack: plugins and extensions. Plugins are fun because those get added by registering bonus plugin directories. Plugin directories are usually just application directories that contain plugins. This means Firefox gets to slowly rummage through bonus application directories looking for what might be a plugin. Extensions are fun because unlike plugins (which affect most browsers on the computer), extensions are very browser-specific. Most extension crapware doesn’t yet support Chrome. Installing things like AVG retards Firefox performance while Chrome escapes unmolested.

Benjamin, I don’t think these software vendors are “doing exactly as we ask of them.”

Personally, I would like us to be a lot more aggressive about blacklisting ill-performing software. Ie we need to go above and beyond warning users when crapware. I would like to to actively check performance of popular plugins/addons and ban them if they are substandard.

Of linkers and avoiding suck

There is a common fallacy that since linkers and compilers are written by really smart people, there aren’t any huge performance wins left in the toolchain. My theory is that the efficiency of any given codebase varies inversely with the number of people who tried to optimize it.

I have long complained of suboptimal binaries generated from our code. Modern profiling tools such as systemtap and icegrind made this painfully obvious. Mike Hommey opted for actually doing something about it. What started as a simple ld.so hack grew into a badass binary-rewriting tool (and the most interesting blog post I’ve read this year).

« Prev - Next »