Extensions & Startup

March 11th, 2010

Dietrich blogged a “wake up and smell the startup” executive overview of startup issues caused by our extension practices. This post is a “numbers” followup. For this experiment I installed a brand-spankin-new copy of Linux Firefox 3.6. Firefox is installed on a 7200 hard drive, the rest of my system lives on an SSD. The CPU is core2duo, keep in mind these numbers will be significantly worse for people running on netbooks and other common hardware. The numbers vary +/- 150ms, but the general picture is pretty clear.

Results

Startup Time
Firefox 3.6 with no extensions: 2240ms
+Adblock Plus (no subscriptions) 2538ms
+Video Download Helper 2727ms
+Personas 3220ms
+Greasemonkey 3300ms
+EasyList subscription for adblock 4044ms

I just doubled cold startup time for Firefox by merely adding 4 extensions. It takes weeks or even months of developer time to shave off every 100ms off Firefox startup, but mere seconds to undo any of those gains by installing extensions. These are just the top-4 extensions in the list (presumably they are higher quality too), I’m sure there are lots of other extensions with more drastic performance hits.

Dietrich’s post details some of the remedies that should reduce the startup cost of extensions. For the inquisitive minds: I used SystemTap to produce a report of files read by Firefox on startup ordered by their startup cost.

Update:
Dietrich asked me to summarize warm startup too:

  • Without extensions: 550ms
  • With above Extensions: 1800ms

Note that this is a developer blog, so by “remedies” I meant “things developers can do to”. There is little normal users can do short of complaining to the extension authors.

This post isn’t meant to shame specific extension authors into speeding up their extensions. The aim is to show that a measurable percentage of startup is due to extensions and that we need to:

  1. Educate extension developers about it
  2. Provide better tools to measure slowdowns caused by extensions
  3. Make sure that the Firefox side of extension handling is sufficiently efficient

I have been told that it should be possible to control the way the GNU linker lays out binaries. Unfortunately until recently I couldn’t figure out the right incantations to convince ld to do my bidding. Turns out what I needed was to be stranded on a beach in Fiji with nothing better to do than to reread the ld info page a few times.

Recipe:

  1. Produce 2 mozilla builds:
    A tracing build with -finstrument-functions in CXXFLAGS/CFLAGS
    A release build with -ffunction-sections and -fdata-sections CXXFLAGS/CFLAGS to allow the linker to move stuff at function or static data(mostly variables) granularity
  2. Link my profile.cpp into libxul in the tracing build (without -finstrument-functions flag)
  3. Run the tracing build, capturing the spew from profile.cpp into a log file
  4. Feed the log file to my script to produce a linker script. This will produce library.so.script files for all of Mozilla libraries.
  5. Rebuild relevant libraries in the release build with -T library.so.script linker flag
  6. Enjoy faster startup

This results in 200ms faster startup my 7200rpm laptop harddrive which is about a 10% of my startup. I think that’s pretty good for a proof of concept. Unfortunately there isn’t a measurable win on the SSD (not surprising) nor a reduction in memory usage (I expected one due to not having to page in code that isn’t needed for firefox startup).

I suspect the problem is that data sections need to be laid out adjacent to functions that refer to them. I started sketching out a treehydra script to extract that info.

I posted the relevant testcase and scripts. Do hg clone http://people.mozilla.com/~tglek/startup/ld to see the simple testcase and various WIP firefox scripts.

Long-term Expectations

The majority of Firefox startup overhead (prior to rendering of web pages) comes from frustrating areas such inefficient libraries (eg fontconfig, gtk) and the mess caused by crappy layout of binaries and overuse of dynamic libraries. This post describes one small step towards fixing the crappy layout of our binaries.

I would like to end up in a world where our binaries are static and laid out such that they are read sequentially on startup (such that we can use the massive sequential read speeds provided by modern storage media). Laying out code/data properly should result in memory usage reductions which should be especially welcome on Fennec (especially on Windows Mobile).

I am hoping to see 30-50% startup time improvements from this work if everything goes according to plan.

Hunting Down Mythical “Slowness”

I recently met a developer who used Chromium instead of Firefox. Chromium’s superior startup speed was his reason for using it.This got me excited because said developer was running Linux, so it was relatively easy to measure cold startup and get a complete IO breakdown.

Turned out Firefox took roughly 23 seconds to start. After much cursing about how I’ve never seen Firefox startup this slow, I eventually gave up on figuring out what’s slowing his startup and instead we measured Chromium startup. It also turned out to also be roughly 23 seconds. The super-slow hard drive made everything slow. Turned out Chromium’s superior startup was a myth in this case.

Measuring Startup

As a result of investigating the startup myth above, my kiwi coworkers encouraged me to post a comparison of Chrome/Firefox startup. I am at linuxconf at the moment so I did the comparison on my laptop.

Laptop configuration:

  • Intel(R) Core(TM)2 Duo CPU  L9400 running at 800Mhz to amplify any performance differences.
  • HITACHI HTS722020K9SA00 harddrive for the user profile and browser binaries
  • OCZ Vertex 30GB SSD for system libraries/configuration.
  • Fedora 12, Minefield 20100119 tarball, chromium-4.0.285.0-0.1.20091230svn35370.fc12.i686
  • sudo sync && sudo sysctl -w vm.drop_caches=3 && sudo sysctl -w vm.drop_caches=0 to clear the disk cache inbetween runs

What am I testing? I am measuring the time between invoking the browser until a JavaScript snippet embedded within a basic webpage is executed (ie Vlad’s approach, with a slightly modified startup.html). The above sysctl command clears disk caches, this creates a similar situation to when one turns on the computer and it hasn’t yet loaded all of the browser libraries from disk into memory. This is a blackbox approach to measuring how long it takes from clicking on the browser icon to get an interactive browser.

Firefox commandline: firefox -profile /mnt/startup/profile/firefox  -no-remote file://`pwd`/startup.html#`python -c ‘import time; print int(time.time() * 1000);’`

Chromium commandline: chromium-browser –user-data-dir=/mnt/startup/profile/chrome  file://`pwd`/startup.html#`python -c ‘import time; print int(time.time() * 1000);’`

Both of these tests are done with an empty profile that was populated and has settled after running the browser a few times.

Results

The following numbers are milliseconds reported by the startup.html above.

Running Chromium five times: 4685, 4168, 4222, 4197, 4232

Running Minefield five times: 3155, 3273, 3352, 3311, 3322

I picked Minefield because that’s the browser that I run and the codebase that I focus on. The linux Chromium channel seems to be the closest parallel to Minefield. I did not test on Windows because it is a bit of a nightmare to measure cold startup there.

Conclusion

On my system Minefield is around 30% faster at starting up with  an empty profile than Chromium (the difference is amplified by running the CPU at 800Mhz). For comparison of Minefield against older Firefox versions, see Dietrich’s post.

I suspect that there is a relatively small difference between the two browsers because we are running into the fundamental limitations of loading large applications into memory (my rant).

Windows 7 Startup Exploration

January 4th, 2010

I did some digging to figure out if one can setup cold-startup testing in Windows 7 without nasty hacks. My conclusion is: sorta-kinda.

The Good – Most of the Ingredients Are Present

I haven’t actively used Windows since pre-XP days. It looks like it has come a long way since then: there is now a decent interactive shell, all kinds of settings/services can be controlled from the commandline and there is even sudo-like functionality.

PowerShell takes inspiration from the korn shell and throws in .net which allows for much nicer “shell programming” than the dominant bash shell.

mountvol is a terrible equivalent to mount in linux – but it exists, so I’m happy.

NTFS junctions are frustrating equivalents to links in a unix filesystem.

The Bad

The essential ability to completely flush filesystem caches isn’t there. This isn’t quite as embarrassing as it seems as Mac OS X’s purge command does not flush the page cache (resulting in mmapped files not purged from cache), so technically OS X has the same limitation and only Linux gets it right.

The Ugly Workaround

After much brainstorming we figured out that we can clear all relevant caches on Mac OS X by putting files that we care about on a separate partition and mounting/unmounting it for every measurement.

Ridiculously, Windows is “smarter” than that and appears to cache stuff per-drive, such that mounting/unmounting a partition has no effect on the cache. The best workaround I could come up with involves putting the said partition onto a USB disk and unplugging it in-between unmount/mount testing cycle.

Windows 7 Startup Recipe

1) Set up junctions for the 2 profile directories to point to the USB partition, unzip firefox onto that partition.

2)
$old = (get-location)
$mountpoint = $env:userprofile + "\cold"
# magic name given by running mountvol
$drive = "\\?\Volume{885d5bc3-e918-11de-a4e5-002268e3077c}\"
# Based on http://poshcode.org/696 + fiddling with UAC settings to avoid prompts
sudo mountvol $mountpoint $drive
# Mountvol doesn't seem to block until drive is mounted
sleep 1
#mountvol
cd $mountpoint\firefox
echo (pwd)
# The following command shows PowerShell awesomeness
# based on Vlad's approach
./firefox.exe -no-remote "file://$(pwd)\startup.html#$([Int64](([DateTime]::utcnow - (new-object DateTime 1970,1,1)).ticks/10000))"
cd $old
# I haven't yet figured out how to wait on firefox.exe to finish
sleep 10
sudo mountvol $mountpoint /d

3) Unplug USB drive