I got tagged by Benjamin, so I better comply and get my blog memed.

Rules

  1. Link back to your original tagger and list the rules in your post.
  2. Share seven facts about yourself.
  3. Tag some (seven?) people by leaving names and links to their blogs.
  4. Let them know they’ve been tagged.

Seven Things

  1. In Soviet Ukraine, kindergarten failed me (luckily they don’t make you retake that). Apparently when I was six, my handwriting and reading skills were not up to the communist standards of time. I still fondly remember excepts of various communist hymns we got to sing along.
  2. Grade 1 coincided with the fall of communism and me getting the most kickass grades of my academic career.
  3. I have owned two cars, but I have since traded that lifestyle for a garage full of bikes. I never liked the effects that driving had on my health nor the effect that crappy German engineering had on my savings.
  4. I met my wife on my first midnight mystery ride. Riding at midnight is awesome because that’s the only time that traffic dies down enough that one sees nothing but fellow bikers and drunk pedestrians.
  5. To buy groceries I ride a comical contraption called an adult tricycle. We even worked the trike into our wedding.
  6. I never had a real job. For some reason all of the Burger Kings and Subways that I applied at never took an interest in me. Instead, in grade 8 my first source of income was teaching Java to someone 2.5x my age for $3 an hour. From then on I bounced around various part-time jobs and internships until I ended up at Mozilla, my first real job.

  7. For a something like six years I wore long hair to go along with my taste for Metal. I’m always looking for more awesome metal music, currently in heavy rotation are Testament, Winds of Plague, Dying Fetus and Opeth. If you think there is an awesome metal band I might be missing out on, let me know.

Tags

Damon Sicore – There are seven random things I need to know about my boss.

Joshua Cranmer – lets hear seven things about you without mentioning dem0rkification.

Chris Double – I hope to hear seven reasons for one to program in Reverse Polish Notation

Graydon Hoare -  because his blog hasn’t been memed. It’s also getting a bit dated

Fennec A2 – Performance

December 23rd, 2008

Static Analysis vs Performance

Two months ago I got the feeling that I gotta take a break from static analysis and do something that obviously affects Firefox at runtime. Luckily that coincided with ramp-up on Fennec performance work.

I find that I enjoy fixing existing code a lot more than other sorts of programming, so I was extremely happy to switch focus from the static analysis way of fixing code to my other favourite: optimization. Both are peculiar programming endeavours because after a bunch of gruntwork the program ends up doing the exact same thing as before, but better.

In static analysis I focus more on how different pieces fit together, whereas in an optimization I get to focus on what various pieces are trying to achieve so I learned a lot more random Mozilla mysteries.

Fennec

Fennec is pure joy to optimize because it runs in such a constrained Linux environment (compared to desktop Linux). Things seem to happen roughly 10x slower on the arm processor than on my core2duo laptop. Thus performance details that are hard to spot on the desktop almost trivial to discover.

There is no hard drive seeks to introduce unpleasant surprise latency. This simplifies things a lot – there is a lot less variance between hot and cold start on n810 than on hard drived desktop.

Unfortunately the N810 linux environment also leaves a lot to be desired. Compiling stuff is a chore. It turns out the oprofile produces nonsense results when a compiler of recent vintage is used (ancient one cant really compile Mozilla).

I had a lot of fun digging deep into Mozilla code and dealing with mischievous timestamps, misbehaving caches and rude GC interruptions. All this was done using stone-age instrumentation techniques on N810.

Mark Finkle blogged some details on Fennec Alpha2 performance. Alpha2 is magnitudes faster than Alpha1, I expect more of the same in subsequent releases.

Software Improvements That  Santa Claus Should Get Me

Even though oprofile is useless on N810, one can get a pretty good idea of what the performance issues are from running it on x86. OProfile is a little rough to use, but I’ve learned to love it when sugared with gprof2dot and xdot. It’s great for locating places in the code to stick printf()s into.

OProfile has taught me that what I really want is Dtrace (or some knockoff) running on n810.

Also, I really hate how embedded Linux takes away one of coolest things about Desktop Linux: ability to compile own kernel. I haven’t been able to get a more modern kernel to run on N810 which means I can’t try a newer version of oprofile or the new omap high res timers. I would also like to get a working image of N810 under qemu, but success has avoided me there too.

Static Stuff

Unfortunately I found that can’t effectively work on static analysis stuff without giving it my full and undivided attention. Right now I’m hoping to set aside time to focus on writing a more general dead code finder and catch up on other misc things sometime in Janurary or February.

Cool Stuff in Foreign Realms

October 9th, 2008

Open Source projects are often like parallel worlds. People reach the same conclusions, attempt similar solutions and are typically blissfully unaware of each other’s existance.

Here are two projects that came to my attention this week:

  • Via Planet KDE I stumbled on Krazy. It’s neat if not somewhat depressing that even if C++ parsers are finally becoming accessible all this is still Perl.
  • Helpful comment on my previous entry pointed me to dead method hunting in OpenOffice.

Uncool Open Source Rant

Reason I switched to linux was because it seemed like a developer’s dream come true. Compilers are a package manager operation away (and generally aren’t tied to OS versions, I’m looking at you Apple), everything can be recompiled to address any particular concerns and there are crapload of weird languages to write your software in. This is especially awesome when compared to a typical proprietory software stack where one can’t easily fix problems that involve multiple components due to licensing issues or due to not having the code available(or due to not having the development environment available). So one would think that Linux distributions hold the ultimate software power: unlimited pass to modify their offering to their target audience’s content.

Unfortunately my view of Linux ways is unrealistic. For example when people do friggin’ awesome work knocking down boot time to 5 seconds by hacking and slashing their way through the entire software stack involved in boottime delays, distributions claim that it’s not somethin they can seriously consider because they are too set in their ways of general purpose(and generally bloated) init systems and stock kernels(worst case: why can’t we recompile the kernel on the user’s machine or ship a couple of custom variations for common hardware out there). What’s the point of open source if we continue pretending that everything is a general purpose black box that doesn’t like to play together? It’s been two weeks and I haven’t seen a single distro bite the bullet and attempt to list 5 second boot in their goals. Come on guys, don’t you like to be challenged?

Hope someone proves me wrong.

In my quest to rid Firefox of code that doesn’t do anything it is possible to screw up and delete a method that is only used by outside code. So far, I’ve hit a component that isn’t used in Firefox, methods that aren’t used in Firefox and methods that claim someone is about to use them (with a timestamp from 8 years ago). That and I discovered some code only appears to be used from BeOS or OS/2 specific ifdefs.

Not surprisingly, I had someone comment that I do some “dangerous shit”. Sure, I can see why someone would think that.

My protection: I use the try server to make sure that everything builds on supported platforms. On top of that every patch gets reviewed.

Furthermore, I would like to point out that methods that aren’t called within Firefox are less likely to be tested and correct. So if one leaves code that should only be called from other projects, it’d be appropriate if we had some unit tests for it, which would flag the code as in-use. That would let us move to the next level and setup a tinderbox to detect dead code as soon as it is orphaned.

Living Dead Code

September 26th, 2008

The coolest part about my job is that I get to work on tasks that are cool, but typically are forever laid to rest in the would-be-cool-if-we-could-but-we-don’t-have-time-or-resources category. However as project code sizes increases, the usefulness/coolness ratio of static analysis grows and moves from would-be-cool, to nice-to-have to this-is-the-only-way. Now, I’m not sure where Mozilla lies on this scale, but I do know for sure that the giant codebase is an analysis treasure trove.

Dead Code Motivation

I have been talking about dead code detection in Mozilla for ages. As software changes, some pieces of code unintentially get left behind, but often there is no way to tell. It results in unnessary maintanance burden and increased footprint. In roc’s case randomly spotting dead methods may also involve a little IRC griping at me about not having any tools for it, even rudimentary ones. Between that and blizzard’s cheering over reduced code size, I had no choice, but to give it a try once I had enough outparamdelling being reviewed.

Dead Code Results

First of, the approach described below seems to work well: see bug with initial results. Now I get a few thousand reported methods to inspect and refine the results.

Dead Code Approach

I’ve been pondering dead code detection since I’ve started working on this stuff. There are lots of ways to do it, but I finally settled on the dead-simplest one: method-level granularity. Ingredients are:

  • Dehydra/Treehydra extraction JavaScript: Dehydra makes it easy to enumerate methods and class hierarchies. Treehydra makes it possible to extract every mention of a method from the code(except for pointers to virtual functions that were casted…in that case we are left with vtable index and a type that the pointer was cast to). Additionally, Treehydra counts the number of AST nodes visited in every function body.
  • Shell script to aggregate the result of processing Mozilla source. Gotta admit, perl’s hashtables give GNU sort -u a run for its money.
  • Ocaml program to do the super-dumb algorithm. Classes with the same names, are assumed to be the same class. Method overloads are assumed to be a single method. All methods are assumed to be virtual. Every ClassType::FUNCTION_CALL call walks down to all children & up through all the parents and marks the derived methods as called. Then all derivatives of scriptable XPIDL are filtered out and the uncalled methods are printed out. In order to find most exciting functions first, methods in the results are sorted by their AST count =D.
    Language Nerd Trivia: Why OCaml – because ADTs are no fun in other languages. Why is my OCaml so bad – because I lost the hang out if due to not writing anything it for the past 2 years.

I’m sure most people reading this will go: “Wait a minute, this is unsound if you don’t see the virtual function pointers?”, to which I reply: “Once I nuke the method and mozilla compiles successfully, it’s sound”. In reality someone could make a puny GCC patch to preserve more data in virtual function pointer assignments.

Since this is all in early stages there a bunch of things I don’t deal with: method overloads, constructors, destructors and overloaded operators….and templates. All function bodies are scanned, but some function names are a pain to deal with or they aren’t straightforward function calls in GCC so they don’t participate in the dead/alive contest.

Where To Go From Here

Well, once we run out of dead code detected by the primitive caveman approach above, we’ll have to investigate less conservative approaches possibly involving abstract interpretation and callgraphs.

How To Get Involved in Screwing With Software Cost Models By Contributing Negative Line Counts

This project has a lot of places to help out:

  • work through the dead method list, filing bugs accordingly and deleting any related code
  • Extend the machinery to work on all non-static function
  • Try this on your favourite large codebase.
  • Write the virtual function pointer annotation patch :)

Error Presentation

September 3rd, 2008

Certain other cool open source projects are doing cool static analysis work. In this case, here is an analysis of one of my favourite operating systems projects, DragonFly BSD.

I’m blown away by the clean UI. The error filter and the interleaving of static analysis results in the source code are drool-inducing. This is powered by the clang checker. Clang’s checker doesn’t yet do C++, doesn’t do application-specific checks and has a lot of false positives, but it’s an exciting preview of things to come.

Oh and I hope that DXR will have similar analysis awesomeness. In the future, I hope to see static analysis become almost as common as unit-testing.

New Static Analysis Toys

I have been catching up on my backlog of little bugs, here are some of the most notable ones.

Benjamin has been pushing the limits of what Dehydra can do for his DXR prototype which resulted in a couple of cool new features with one new feature breaking backwards compatibility. Sorry about that, it is for the greater good.

Dehydra now processes more declarations.

Dehydra uses JavaScript prototypes to distinguish between types and declarations.

Treehydra is now built by default when building with a plugin-enabled compiler.

Treehydra now exposes the C++ frontend’s verbose and as-close-as-gcc-gets-to-written-code syntax tree via process_cp_pre_genericize. Access to the early C++ AST should make it easier to automatically translate a certain class of C++ functions into JavaScript.

Coming soon: buildbot setup for Dehydra along with autobuilt debian packages.

Also, Benjamin’s GSoC student, Bo Yang, has been doing some awesome work making our static analysis toolchain work on mingw. In my mind, Bo sealed his awesomeness in not only getting Mozilla to build under mingw yet again, but also by fixing a couple of exciting compiler bugs on Win32.

Path to 1.0

For more information on these and other developments see the Dehydra 1.0 tracking bug.

I am not yet sure what the next release of Dehydra will be. My giant GTY patch to GCC is still awaiting review in a GCC developer’s inbox. Depending on whether that gets accepted I’ll continue releasing Dehydra 0.9.x with the current GCC patchset or delay a 1.0 release to work on getting the GCC plugin API reviewed and more or less finalized.

Plans for Near Future

I think I figured out the missing pieces needed to make outparamdel’s deCOMtamination patches acceptable, will work on that next. I’ll be continuing to clean up pork to be more developer-friendly. After the recent unhappyness involving bisection 10separate repositories at once, I’ve decided to merge pork into one giant repository and if someone just wants a couple smaller of pieces, those should be proken up at the package management level.

Additionally, I would like to start landing the SpiderMonkey analyses soon.

I have never enjoyed the theory behind software engineering. It seems particularly depressing as it can be summarized as: “What can we learn from past software development experience in order to not repeat old mistakes such that we can come up with newer and shinier mistakes?”.

For that reason I haven’t been able to stick to any particular software development doctrine (paired, test-driven, OO, SOA, etc) and instead taken shortcuts to whatever is practical at the time.

One unfortunate result of such neglect is that the oink test suite ended up not being utilized. I tried it a couple of times while starting out with oink and it failed in many cumbersome ways. However, as pork evolved out of oink, I learned more about the “architecture” behind it, I fixed a couple of the issues that were causing funny make errors.

However one giant bug remained. Turned out other people were able to run the original oink testsuite, but not the equivalent one in pork. Fearing that I somehow screwed up Elsa, I spent way too long investigating the failure only to learn it wasn’t my fault. Pork users: rejoice, the testsuite should run as expected now.

PS. I may not be a SENG believer, but I do think that open source + good version control + testsuites result in better software.

Summit

August 4th, 2008

The past week rocked. I was especially impressed with the localizers. The guys who bear through translating an entire browser with associated websites to expose their country to an awesome browsing experience are simply electrifying.

It was great to see the South American guys again and to meet hordes of Europeans.

The most exciting outcome of the summit in my neck of the static analysis woods is that DXR (a semantically aware successor to MXR) will be rewritten from scratch in Python. Another reason to rejoice is that a tracing spidermonkey should make Treehydra ridiculously fast without much (or any) effort on my part.

Pull pork with care

July 25th, 2008

I just committed the large giant change to bring down elsa’s namespace pollution to reasonable levels. Elsa code now is now using std::foo style, or using namespace std. As I mentioned before, Elsa’s string is now sm::string, a summary of how to perform similar renames is here. The good news is that Pork will soon work out of the box with a modern toolchain.

For the handful of porkers out there, you need to hg pull & hg up all of the pork repositories. This has been a use-case in why splitting up a codebase into a billion repositories is a bad idea:

a) Lovely, I have to do many commits instead of one

b) To top it off, now my users will curse my name while updating whatever pork repository that interests them most.

I feel like I’m going to throw up if I see any more C++ code diffs in the next 10minutes.

In contrast, while rewriting things on the Mozilla-scale is a lot less feasible manually, it is very rewarding to automate. Gotta love big C++ codebases.