Living Dead Code

September 26th, 2008

The coolest part about my job is that I get to work on tasks that are cool, but typically are forever laid to rest in the would-be-cool-if-we-could-but-we-don’t-have-time-or-resources category. However as project code sizes increases, the usefulness/coolness ratio of static analysis grows and moves from would-be-cool, to nice-to-have to this-is-the-only-way. Now, I’m not sure where Mozilla lies on this scale, but I do know for sure that the giant codebase is an analysis treasure trove.

Dead Code Motivation

I have been talking about dead code detection in Mozilla for ages. As software changes, some pieces of code unintentially get left behind, but often there is no way to tell. It results in unnessary maintanance burden and increased footprint. In roc’s case randomly spotting dead methods may also involve a little IRC griping at me about not having any tools for it, even rudimentary ones. Between that and blizzard’s cheering over reduced code size, I had no choice, but to give it a try once I had enough outparamdelling being reviewed.

Dead Code Results

First of, the approach described below seems to work well: see bug with initial results. Now I get a few thousand reported methods to inspect and refine the results.

Dead Code Approach

I’ve been pondering dead code detection since I’ve started working on this stuff. There are lots of ways to do it, but I finally settled on the dead-simplest one: method-level granularity. Ingredients are:

  • Dehydra/Treehydra extraction JavaScript: Dehydra makes it easy to enumerate methods and class hierarchies. Treehydra makes it possible to extract every mention of a method from the code(except for pointers to virtual functions that were casted…in that case we are left with vtable index and a type that the pointer was cast to). Additionally, Treehydra counts the number of AST nodes visited in every function body.
  • Shell script to aggregate the result of processing Mozilla source. Gotta admit, perl’s hashtables give GNU sort -u a run for its money.
  • Ocaml program to do the super-dumb algorithm. Classes with the same names, are assumed to be the same class. Method overloads are assumed to be a single method. All methods are assumed to be virtual. Every ClassType::FUNCTION_CALL call walks down to all children & up through all the parents and marks the derived methods as called. Then all derivatives of scriptable XPIDL are filtered out and the uncalled methods are printed out. In order to find most exciting functions first, methods in the results are sorted by their AST count =D.
    Language Nerd Trivia: Why OCaml – because ADTs are no fun in other languages. Why is my OCaml so bad – because I lost the hang out if due to not writing anything it for the past 2 years.

I’m sure most people reading this will go: “Wait a minute, this is unsound if you don’t see the virtual function pointers?”, to which I reply: “Once I nuke the method and mozilla compiles successfully, it’s sound”. In reality someone could make a puny GCC patch to preserve more data in virtual function pointer assignments.

Since this is all in early stages there a bunch of things I don’t deal with: method overloads, constructors, destructors and overloaded operators….and templates. All function bodies are scanned, but some function names are a pain to deal with or they aren’t straightforward function calls in GCC so they don’t participate in the dead/alive contest.

Where To Go From Here

Well, once we run out of dead code detected by the primitive caveman approach above, we’ll have to investigate less conservative approaches possibly involving abstract interpretation and callgraphs.

How To Get Involved in Screwing With Software Cost Models By Contributing Negative Line Counts

This project has a lot of places to help out:

  • work through the dead method list, filing bugs accordingly and deleting any related code
  • Extend the machinery to work on all non-static function
  • Try this on your favourite large codebase.
  • Write the virtual function pointer annotation patch :)

6 Responses to “Living Dead Code”

  1. Daniel Says:

    I’ve started to search for dead code as well, but I’m not very efficient yet, though.

    Bug 455043 for example, it only needs a checkin for some dead code that is executed every page view. Or is this kind of code called something else?

    It seems that your focus is smaller than mine, but dead, unused and redundant code is everywhere.

    Anyway, I’d like to mention Bugs 413141, 433065 and 457208 as well. For what it’s worth..

  2. Joshua Cranmer Says:

    You forgot a wonderful extension:
    * Make it work for JS (also able to spot unused scriptable methods!)

    Also, I’m having a nice spot of removing code… before de-forking and adding documentation, my current WIP patch is doing a -2326/+632. Previous record: -2732/+1871 (after documentation).

    In any case, I guess it’s time for me to run that stuff on mailnews to see what myriads of unused code I haven’t yet discovered.

  3. s Says:

    You are downright dangerous I’m glad you are not fucking around with any codebase I have to deal with

  4. markus Says:

    Do you guys realize that we need a language that attempts to strive to have VERY FEW AMOUNT OF LINES OF CODES?

    Growing lines-of-code count is really the number 1 ocmplexity problem in every programming language i know … in some it is worse than in others

  5. bubak Says:

    Thats why I like Java… :-)

  6. Anders Says:

    Are you going to collaborate with Caolan McNamara from Redhat doing similar work on OpenOffice?

    http://www.skynet.ie/~caolan/Packages/callcatcher.html

    http://blogs.linux.ie/caolan/2008/04/14/goooocon08/

    OpenOffice would probably be a great place for automatic rewriting of C++.

    Best,
    Anders

Leave a Reply