Dehydra Testsuite Passes on GCC 4.5
November 20th, 2009
I spent couple of days fixing the remaining test-suite failures on GCC 4.5 trunk for Dehydra. Since the last time I looked into this, GCC went from crashing all over the place to only crashing if I did something bad. It was nice to discover that as a result of switching to 4.5 Dehydra users will get saner .isExplicit behavior and more precise location info.
Treehydra will take more work due to me misunderstanding GTY annotations.
By the way, I am really grateful for all of the people who contributed GCC 4.5 fixes so far. You guys have been a big help in getting Dehydra testsuite to 100% on 4.5. Looks like I will meet my goals to finish De+Treehydra by the end of the year in time for GCC 4.5 release and my “Introducing Dehydra to the Developer World”-type talk at LinuxConf.au.nz 2010.
Startup
I reduced my focus on startup speed at the moment to catch up on Dehydra. I plan to work on reducing xpconnect overhead during startup next, ie more of this bug.
FSOSS & Dehydra Update
November 6th, 2009
Last week I was in Canada to present at FSOSS with David Humphrey on awesome Mozilla Tools: Dehydra, DXR, Pork, etc. I think we managed to convey the message regarding what a sad affair that current developer development tools are.
General-Purpose Dehydra Scripts
Dehydra grew out of Mozilla’s constant need to figure out what is going on in the source code. As a result most of our scripts are very Mozilla API-specific. This makes harder for people outside of Mozilla to learn Dehydra. There is no library of Dehydra code that one can just plugin to start analyzing their codebase. Instead one has to sit down, figure out what Dehydra is capable of and then see if any of the problems facing the developer can be solved this way. If anyone wants to contribute such a library, let me know.
In the meantime, more general-purpose analyses are surfacing.
Shadowed Members
My favourite script so far is the member-shadowing checker. I ran into a member-shadowing warning that is unique to Sun’s C++ compiler. It was triggered by some code that I just landed on the tree. I fixed the warning, but within a few days a coworker ran into a bug caused by that member shadowing(due to having an unlucky revision of the code). The following example shows how simple it was to implement the warning in GCC/Dehydra.
See bug 522776 for the complete story on adding the member shadowing check to Mozilla.
Printf
Another general purpose analysis was done outside of Mozilla by Philip Taylor for his game. His script checks wide printf format strings (which are overlooked by gcc).
Independently, Benjamin wrote a printf checker for Mozilla printf-like code, see bug 493996.
Custom Sections in Object Files
We have long speculated about how nice it would be if Dehydra could emit info into object files that could then be yanked out of the resulting binary (by say, valgrind). bug 523435 will soon make that a reality.
Studying Library IO – SystemTap Style
October 23rd, 2009
In my last blog post I expressed frustation with slowness induced by library IO. Then I went on a mission to measure it. I have been wanting to this for a while, but I figured that only DTrace can get this info without recompiling my kernel. So I tried to build Mozilla under Slowlaris (but the linker got up to 3GB and then set there swapping, ensuring that the nickname is justified). Then I fired up DTrace on the mini, but ran screaming because it seemed like fbt DTrace provider refused to let me dereference structs (later Joel told me that I’m supposed to copy data explicitly like here).
But while googling for a fbt workaround, I stumbled upon a DTrace/SystemTap comparision wiki. SystemTap? The DTrace knockoff I have been hearing about? It works? This was a lightbulb moment where I realized that Linux was about to provide me with more information than I thought was possible.
So here is the data I got out of it:
Rant on Library IO
October 20th, 2009
So I’ve been trying to figure out how optimize disk IO startup. I looked into IO caused by libraries and turns out that apps with big libraries are screwed. Here is how I came to this conclusion:
Gnomer’s research on startup pointed out that dumb readahead leads to wins in terms file io. So I wrote some code and sure enough, reading in libxul on top of our main() function does indeed result in a significant measurable speed-up on both Linux and OSX.
From the gnome page I found a link to some diskstat stuff. There lay a presentation with graphs that appear to show that OpenOffice has a much better cold IO pattern than Firefox. Given that there are some strong similarities between our application layouts I went digging to see if OpenOffice does something funny. And oh boy, it does do funny page reordering on Windows and “slightly-smarter-than-dumb-readahead-style library prefetch” on Linux…
So here is an innocent question: Why is page-reordering not done as a PGO step? I mean shouldn’t you fire up your app, feed some info back to the linker and be done with it? Another question: Why can’t we mark certain files as “keep this whole file in ram if someone asks for part of it to be paged in”?
So is the only way to fast application startup via static linking? It sure is easy to
posix_fadvise(open(argv[0],O_RDONLY), POSIX_FADV_WILLNEED);
Are these hacks still the state of the art in making apps with large libraries startup fast?
Update: Found some mentions of GNU Rope unfinishedware and a relatively recent blog post
Restless Bug Fixing
October 8th, 2009
I spent the past couple weeks analyzing and improving fastload performance. I’ve long been suspicious of fastload, but only finally got around to investigating it in detail. I think there is some fundamentally ironic rule in software that if you put the word “fast” in the name of a component, it is bound to eventually become a performance bottleneck.
Almost a decade has passed since the conception of this code, so it was time to update code’s assumptions to reflect the capabilities of modern OSes. I landed the fix today. It results in startup performance gains of 1-20% on various platforms I tested, making this the most exiting perf bug I’ve worked on.
Plans
Now that I’ve had my fill of almost a year’s worth of startup performance analysis, for the remainder of the year I plan to refocus on static analysis. My main goal is decent C support on Dehydra(not to mention the ever elusive GCC 4.5 compatibility) and to facilitate a production-quality DXR.
I’m hoping that we’ll end up with cool ways of dealing with the painful/slow boilerplate (bugs 520626, 516085 and 517370)
Corrupting Innocent Minds With GCC
September 30th, 2009
Ever since the plugin branch landed in GCC, I have been itching to explore the application-specific optimization space that it opens up. It’s really hard to optimize code in the general case, but it’s relatively easy to optimize for something for specific use-cases. We can rely on API-specific static analysis in order to get rid of the API-imposed overheads at compile time. Let me repeat, we can get rid of some API-induced suck (OO frameworks usually have a lot of it) without sacrificing any of the benefits.
Unfortunally, I got busy working on, supposedly, more important stuff such as making Firefox startup quicker, so my de-error-handling and de-virtualizer (basically possible with LTO, but we can prove that certain classes will never be overloaded via dynamic linking) ideas had to be put on indefinite hold. Luckily, one of David Humphrey’s students decided to take on the first task, see his blog post here. I’m really psyched about this, few things that are cooler than cross-project open source work involving the most important open source projects of our time
Enforcing Inheritance Rules
September 18th, 2009
While writing C++ sometimes one wishes that one could squeeze a little more out of the type system. In this particular case, Zack Weinberg (layout-refactorer extraordinaire), wanted to make sure that certain methods always get overridden in derived classes. Unfortunately, in that particular design, those methods were not pure-virtual. At this point most C++ hackers would cry a little and move on without any compiler assistance.
Instead of crying, Zack added a NS_MUST_OVERRIDE attribute to methods along with a matching Dehydra script. See the source code and the bug for how simple it can be to extend C++ with a useful new check.
Nothing makes me happier than seeing developers land big code changes and accompany them with compiler checks instead of relying on programming folklore to maintain important invariants.
Moving Files Into JARs
August 27th, 2009
Moving files into jars reduces amount of seeks on startup, and has miscellaneous other performance/organization benefits. I added resource://gre-resources/ which maps to jar:toolkit.jar!/res/.
To move a file into a jar:
- Add a jar.mn entry.
- Remove existing references to the file in Makefile.in, packages-static files
- Add file to the removed-files.in list of dead files
- Update urls refering to the file in the source. Sometimes one has to switch from using file streams and filenames to using channels and URIs. This is the hard part.
- Set your bug as blocking bug 513027.
For an example see bug 508421.
Cleaning Up Startup Disk IO
August 20th, 2009
Maintaining a module, killing off another one
I was granted ownership of the jar module. Today, I resumed my quest to kill off the barely limping stopwatch module. Together with nuking STANDALONE mode in jar stuff, I will have landed 75KB worth of -ve diffs this month. It feels so good to delete code.
IO Report
Currently I am focusing on application IO (excluding libraries and IO caused by libraries).
From my empirical measurements, opening individual files on a 7200RPM hard drive costs around 0-40ms. This is on Linux. I presume files open quickly when they are located near previously opened files and slower if a full disk seek is required for them. Combining files is usually a significant win in terms of throughput. It turns out that even warm starts and reading from SSDs can benefit from combined IO. Currently small file throughput ranges from <1KB/s to <200KB/s for files < 500K. Combining files into memory mapped jars bumps that up to 1-1.5MB/s (currently jar files are relatively small, making them responsible for a higher proportion of IO should boost that further).
The biggest gains are to be had on Windows Mobile where almost every seemingly trivial filesystem operation takes 2-3ms.
I would like to reduce the number of files read on startup to a dozen or so to be able to crank up disk throughput. Unfortunately, there is a lot to be done, I could use a great deal of help.
Below is a long list of files gathered by stracing firefox-bin, and what I know about them:
Read the rest of this entry »
There is nothing exciting about filesystems
August 14th, 2009
When I originally started at Mozilla, I only knew the people who interviewed me. But I quickly discovered beltzner when he uttered a sacrilegious statement that went something like: “….. nothing could be as boring as filesystems….”. Mike Beltzner is one of my favourite characters at Mozilla for his ability to speak his mind, but this quote has troubled me greatly. How can one not care about filesystems? Linux’s ability to do file stuff efficiently makes it magnitudes faster than other operating systems. Plan 9’s file-system-centric layout proved that OSes don’t have to consist of a series of poorly named and categorized system calls. In fact, a clean file layout allows many awesome optimizations. ZFS is one of the few things keeping Solaris relevant. HFS+ is one of the things keeping OSX from being fast.
Being a Linux user, I was disappointed by the pointlessness of optimizing application IO. Sure we inefficiently open tons of files on startup, sure we hit the filesystem 10-100x more than we could, why would one optimize when there when there is no more than a few percent of startup being take up by terrible io patterns?
Excitingly Crappy Filesystems
Luckily Firefox runs on OSX and we are making it run on WinCE. I was delighted to discover that on wince* we paid 1-5ms per file existence check, modification date, size, etc. I was shocked to see that the throughput while reading certain files could be expressed in bytes per second (most crappy flash media seems to be able to pull in >1mb/s). This brought upon switching our jar io to mmap, amalgamating jar files, moving more files into jars, etc. I’ll blog about the details later. My basic idea is that we can utilize jar files as “controlled filesystem environments” to deal with having to run on crappy OSes with exceptionally bad filesystems. OSes such as OSX where file IO is barely faster than that of a WinCE phone.
Beltzner, wouldn’t it be exciting if OSes like Mac OSX had file systems worth being excited about?
* MS likes to use puns for their product names