Cool Stuff in Foreign Realms
October 9th, 2008
Open Source projects are often like parallel worlds. People reach the same conclusions, attempt similar solutions and are typically blissfully unaware of each other’s existance.
Here are two projects that came to my attention this week:
- Via Planet KDE I stumbled on Krazy. It’s neat if not somewhat depressing that even if C++ parsers are finally becoming accessible all this is still Perl.
- Helpful comment on my previous entry pointed me to dead method hunting in OpenOffice.
Uncool Open Source Rant
Reason I switched to linux was because it seemed like a developer’s dream come true. Compilers are a package manager operation away (and generally aren’t tied to OS versions, I’m looking at you Apple), everything can be recompiled to address any particular concerns and there are crapload of weird languages to write your software in. This is especially awesome when compared to a typical proprietory software stack where one can’t easily fix problems that involve multiple components due to licensing issues or due to not having the code available(or due to not having the development environment available). So one would think that Linux distributions hold the ultimate software power: unlimited pass to modify their offering to their target audience’s content.
Unfortunately my view of Linux ways is unrealistic. For example when people do friggin’ awesome work knocking down boot time to 5 seconds by hacking and slashing their way through the entire software stack involved in boottime delays, distributions claim that it’s not somethin they can seriously consider because they are too set in their ways of general purpose(and generally bloated) init systems and stock kernels(worst case: why can’t we recompile the kernel on the user’s machine or ship a couple of custom variations for common hardware out there). What’s the point of open source if we continue pretending that everything is a general purpose black box that doesn’t like to play together? It’s been two weeks and I haven’t seen a single distro bite the bullet and attempt to list 5 second boot in their goals. Come on guys, don’t you like to be challenged?
Hope someone proves me wrong.
In my quest to rid Firefox of code that doesn’t do anything it is possible to screw up and delete a method that is only used by outside code. So far, I’ve hit a component that isn’t used in Firefox, methods that aren’t used in Firefox and methods that claim someone is about to use them (with a timestamp from 8 years ago). That and I discovered some code only appears to be used from BeOS or OS/2 specific ifdefs.
Not surprisingly, I had someone comment that I do some “dangerous shit”. Sure, I can see why someone would think that.
My protection: I use the try server to make sure that everything builds on supported platforms. On top of that every patch gets reviewed.
Furthermore, I would like to point out that methods that aren’t called within Firefox are less likely to be tested and correct. So if one leaves code that should only be called from other projects, it’d be appropriate if we had some unit tests for it, which would flag the code as in-use. That would let us move to the next level and setup a tinderbox to detect dead code as soon as it is orphaned.
Living Dead Code
September 26th, 2008
The coolest part about my job is that I get to work on tasks that are cool, but typically are forever laid to rest in the would-be-cool-if-we-could-but-we-don’t-have-time-or-resources category. However as project code sizes increases, the usefulness/coolness ratio of static analysis grows and moves from would-be-cool, to nice-to-have to this-is-the-only-way. Now, I’m not sure where Mozilla lies on this scale, but I do know for sure that the giant codebase is an analysis treasure trove.
Dead Code Motivation
I have been talking about dead code detection in Mozilla for ages. As software changes, some pieces of code unintentially get left behind, but often there is no way to tell. It results in unnessary maintanance burden and increased footprint. In roc’s case randomly spotting dead methods may also involve a little IRC griping at me about not having any tools for it, even rudimentary ones. Between that and blizzard’s cheering over reduced code size, I had no choice, but to give it a try once I had enough outparamdelling being reviewed.
Dead Code Results
First of, the approach described below seems to work well: see bug with initial results. Now I get a few thousand reported methods to inspect and refine the results.
Dead Code Approach
I’ve been pondering dead code detection since I’ve started working on this stuff. There are lots of ways to do it, but I finally settled on the dead-simplest one: method-level granularity. Ingredients are:
- Dehydra/Treehydra extraction JavaScript: Dehydra makes it easy to enumerate methods and class hierarchies. Treehydra makes it possible to extract every mention of a method from the code(except for pointers to virtual functions that were casted…in that case we are left with vtable index and a type that the pointer was cast to). Additionally, Treehydra counts the number of AST nodes visited in every function body.
- Shell script to aggregate the result of processing Mozilla source. Gotta admit, perl’s hashtables give GNU sort -u a run for its money.
- Ocaml program to do the super-dumb algorithm. Classes with the same names, are assumed to be the same class. Method overloads are assumed to be a single method. All methods are assumed to be virtual. Every ClassType::FUNCTION_CALL call walks down to all children & up through all the parents and marks the derived methods as called. Then all derivatives of scriptable XPIDL are filtered out and the uncalled methods are printed out. In order to find most exciting functions first, methods in the results are sorted by their AST count =D.
Language Nerd Trivia: Why OCaml - because ADTs are no fun in other languages. Why is my OCaml so bad - because I lost the hang out if due to not writing anything it for the past 2 years.
I’m sure most people reading this will go: “Wait a minute, this is unsound if you don’t see the virtual function pointers?”, to which I reply: “Once I nuke the method and mozilla compiles successfully, it’s sound”. In reality someone could make a puny GCC patch to preserve more data in virtual function pointer assignments.
Since this is all in early stages there a bunch of things I don’t deal with: method overloads, constructors, destructors and overloaded operators….and templates. All function bodies are scanned, but some function names are a pain to deal with or they aren’t straightforward function calls in GCC so they don’t participate in the dead/alive contest.
Where To Go From Here
Well, once we run out of dead code detected by the primitive caveman approach above, we’ll have to investigate less conservative approaches possibly involving abstract interpretation and callgraphs.
How To Get Involved in Screwing With Software Cost Models By Contributing Negative Line Counts
This project has a lot of places to help out:
- work through the dead method list, filing bugs accordingly and deleting any related code
- Extend the machinery to work on all non-static function
- Try this on your favourite large codebase.
- Write the virtual function pointer annotation patch
Error Presentation
September 3rd, 2008
Certain other cool open source projects are doing cool static analysis work. In this case, here is an analysis of one of my favourite operating systems projects, DragonFly BSD.
I’m blown away by the clean UI. The error filter and the interleaving of static analysis results in the source code are drool-inducing. This is powered by the clang checker. Clang’s checker doesn’t yet do C++, doesn’t do application-specific checks and has a lot of false positives, but it’s an exciting preview of things to come.
Oh and I hope that DXR will have similar analysis awesomeness. In the future, I hope to see static analysis become almost as common as unit-testing.
This week in the Static Analysis Corner
August 13th, 2008
New Static Analysis Toys
I have been catching up on my backlog of little bugs, here are some of the most notable ones.
Benjamin has been pushing the limits of what Dehydra can do for his DXR prototype which resulted in a couple of cool new features with one new feature breaking backwards compatibility. Sorry about that, it is for the greater good.
Dehydra now processes more declarations.
Dehydra uses JavaScript prototypes to distinguish between types and declarations.
Treehydra is now built by default when building with a plugin-enabled compiler.
Treehydra now exposes the C++ frontend’s verbose and as-close-as-gcc-gets-to-written-code syntax tree via process_cp_pre_genericize. Access to the early C++ AST should make it easier to automatically translate a certain class of C++ functions into JavaScript.
Coming soon: buildbot setup for Dehydra along with autobuilt debian packages.
Also, Benjamin’s GSoC student, Bo Yang, has been doing some awesome work making our static analysis toolchain work on mingw. In my mind, Bo sealed his awesomeness in not only getting Mozilla to build under mingw yet again, but also by fixing a couple of exciting compiler bugs on Win32.
Path to 1.0
For more information on these and other developments see the Dehydra 1.0 tracking bug.
I am not yet sure what the next release of Dehydra will be. My giant GTY patch to GCC is still awaiting review in a GCC developer’s inbox. Depending on whether that gets accepted I’ll continue releasing Dehydra 0.9.x with the current GCC patchset or delay a 1.0 release to work on getting the GCC plugin API reviewed and more or less finalized.
Plans for Near Future
I think I figured out the missing pieces needed to make outparamdel’s deCOMtamination patches acceptable, will work on that next. I’ll be continuing to clean up pork to be more developer-friendly. After the recent unhappyness involving bisection 10separate repositories at once, I’ve decided to merge pork into one giant repository and if someone just wants a couple smaller of pieces, those should be proken up at the package management level.
Additionally, I would like to start landing the SpiderMonkey analyses soon.
Oink testsuite within pork passes
August 11th, 2008
I have never enjoyed the theory behind software engineering. It seems particularly depressing as it can be summarized as: “What can we learn from past software development experience in order to not repeat old mistakes such that we can come up with newer and shinier mistakes?”.
For that reason I haven’t been able to stick to any particular software development doctrine (paired, test-driven, OO, SOA, etc) and instead taken shortcuts to whatever is practical at the time.
One unfortunate result of such neglect is that the oink test suite ended up not being utilized. I tried it a couple of times while starting out with oink and it failed in many cumbersome ways. However, as pork evolved out of oink, I learned more about the “architecture” behind it, I fixed a couple of the issues that were causing funny make errors.
However one giant bug remained. Turned out other people were able to run the original oink testsuite, but not the equivalent one in pork. Fearing that I somehow screwed up Elsa, I spent way too long investigating the failure only to learn it wasn’t my fault. Pork users: rejoice, the testsuite should run as expected now.
PS. I may not be a SENG believer, but I do think that open source + good version control + testsuites result in better software.
Summit
August 4th, 2008
The past week rocked. I was especially impressed with the localizers. The guys who bear through translating an entire browser with associated websites to expose their country to an awesome browsing experience are simply electrifying.
It was great to see the South American guys again and to meet hordes of Europeans.
The most exciting outcome of the summit in my neck of the static analysis woods is that DXR (a semantically aware successor to MXR) will be rewritten from scratch in Python. Another reason to rejoice is that a tracing spidermonkey should make Treehydra ridiculously fast without much (or any) effort on my part.
Pull pork with care
July 25th, 2008
I just committed the large giant change to bring down elsa’s namespace pollution to reasonable levels. Elsa code now is now using std::foo style, or using namespace std. As I mentioned before, Elsa’s string is now sm::string, a summary of how to perform similar renames is here. The good news is that Pork will soon work out of the box with a modern toolchain.
For the handful of porkers out there, you need to hg pull & hg up all of the pork repositories. This has been a use-case in why splitting up a codebase into a billion repositories is a bad idea:
a) Lovely, I have to do many commits instead of one
b) To top it off, now my users will curse my name while updating whatever pork repository that interests them most.
I feel like I’m going to throw up if I see any more C++ code diffs in the next 10minutes.
In contrast, while rewriting things on the Mozilla-scale is a lot less feasible manually, it is very rewarding to automate. Gotta love big C++ codebases.
Dogfooding pork & OSCON
July 22nd, 2008
I wrote a class renamer and used it to fix my pork pet-pieve #1: a class named string that isn’t std::string. This has been a low priority goal for as long as I’ve been using Elsa. It’s pretty cool to apply a tool to fix itself.
The renamer is a 3x simpler than the next simplest tool. I plan to extend it to also rename class members. Renaming is the most trivial use-case for rewriting code, I plan to post a tutorial on usingĀ the renamer in the near future.
OSCON
If you are at OSCON, you do not want to miss our static analysis session on Wednesday.
Static Analysis and Refactoring Tooling Updates
July 9th, 2008
Hydras
I am close to landing a flow check. Turns out, it is super-easy to introduce new analyses into Mozilla due to a very nice build system hooks setup by bsmedberg.
Since coming back from the GCC summit I have forward-ported our GCC patches to GCC trunk. The FSF legal paperwork came through today so I posted the first and biggest patch to the GCC for review.
I am not sure if I mentioned this before, but the C port of Dehydra is somewhat operational. It doesn’t yet have access to function bodies, but type traversal should work. Unfortunately, the C frontend has less features(pretty printing sucks, locations are even less reliable, etc) and thus is less awesome to work with than the C++ frontend.
jst was awesome enough to list some interfaces that need some outparamdelling. The list is here (in the content/ section). This lead me to spent some time making outparamdel’s output prettier. There are still some improvements to be made, and I will be making them in the near future. However if someone is interested in refactoring of this kind land in the near future, they could easily complete outparamdel’s work with some clever scripting and a bit of manual labour. Sure beats doing the entire thing manually. From outparamdel’s perspective last 10% appear to be slightly painful and might take some time.
Here is a patch that takes about 30seconds to produce.
Another exciting aspect of this is that a certain emacs wizard has confirmed that it would be possible to feed emacs such a patch file and have it correct indentation for the affected areas only.
I am also very excited that a certain volunteer came forward and decided to start improving some of the stomach-turning areas of Pork. Hopefully in the near future we’ll modernize the C++ a little bit and a user’s first reaction wont be: “What the hell, why can’t I do ‘using namespace std;’”.
To this end I have filed a bug to write a renamer tool so we can dogfood renaming of unfortunately named pieces of code.
OSCON
The plan is to have some sort of a minisession on our static analysis efforts at Mozilla. So if you are attending OSCON and are interested in doing exciting things to depressingly large amounts of code, drop me a line.