Are you are a developer who has been frustrated by the pathetic state of the art in code search engines and code browsing experience on the web? Have you been longing for being able to view code with more aid than coloured words?

David Humphrey just released his DXR bombshell. The “basic” concept behind DXR is extraction of the rich semantic information gathered by tools like GCC, Spidermonkey and xpidl. This data is then coherently linked together into a pretty UI in order to provide cleverness during code browsing sessions.

DXR will be happy to answer seemingly trivial queries:

  • List implementations of interfaces in C++ (and soon JS).
  • Provide relevant search results by searching semantic data first. No, grep is no longer state of the art for searching code.
  • Switch between definition and declaration.
  • Walk up/down class hierarchies.
  • Lookup typedefs, types, etc.

I’ve been wanting to see this sort of tool built on top information exposed by Dehydra since I got it working. Words can not express how pumped I am about DXR and the magic powers that we will be granting it.

I moved dehydra to a more official location, please update your scripts and hg settings.
New dehydra url:
http://hg.mozilla.org/rewriting-and-analysis/dehydra/

Pork got reshuffled during the move, it’s now 2 repositories. oink is dead. It now depends on current versions of flex (as opposed to flex-old) and features a cleaned up buildsystem.

New way to checkout pork:

hg clone http://hg.mozilla.org/rewriting-and-analysis/pork
hg clone http://hg.mozilla.org/rewriting-and-analysis/elsa pork/elsa

Dehydra Updates

May 11th, 2009

How well are you packing your structs?
Arpad asked that question with an awesome Dehydra script and came up with an interesting list.

GCC 4.3.3 Is supported
GCC 4.3.3(4.3.[210] worked) broke C++ compatibility in the headers used by Dehydra.
Zach pointed out that passing -fpermissive to g++ solves the problem. Sorry to all the people who had issues building the hydras with GCC 4.3.3, that’s fixed now.

As I mentioned before, we are skipping GCC 4.4 support in Dehydra and aiming for supporting unpatched GCC 4.5. I wish that the small GCC patches were as quick to land as that big one I landed a couple of weeks ago :(.

GCC Rant + Progress

April 30th, 2009

I feel strange working on GCC-specific stuff and then discussing it on planet mozilla as mozilla work. However, without GCC, Dehydra and Treehydra would not be half as awesome (much less feasible even). The power of open source is that it allows us to leverage the entire open source ecosystem to achieve specific goals. When open source projects combine their efforts, not even the biggest software companies can compete as cross-project goals would be incredibly expensive and unpleasant otherwise.

Occasionally, it is very frustrating to see people treat open source software as immutable and independent black boxes. In my personal experience, the browser and the compiler are viewed as finished products and therefore it is OK to bitch and complain about them. That’s frustrating because the same users could be channeling that energy in a more positive way by reporting bugs, contributing code/documentation, etc.

Sometimes these rants result in rather comical conclusions: Ingo’s rant is priceless. My perspective on this:

  • what have Linux kernel devs done to help GCC help them?
  • <flame>Sparse is a deadend. Writing compiler code in C is silly, writing analysis code in C is sillier (and frustrating and limiting). Taking a crappy parser and bolting a crappy compiler backend onto it will result in bigger pile of crap :) Given how smart kernel devs are, they sure like wasting their time on crappy solutions in crappy languages.</flame>
  • Wouldn’t it be cool if instead of complaining these talented people wrote a GCC plugin to do what they want?

GCC Plugin Progress

I finally landed the massively boring and annoying GTY patch. I can barely believe that the patch went in so smoothly without excess complaining from GCC devs. From GCC perspective it’s merely a cosmetic cleanup that affects a large number of headers. For us it enables Treehydra to be generated via Dehydra with little manual effort. It basically makes Treehydra possible without patching GCC. I have another 3-4 patches that need to land before trunk GCC can run the hydras out of the box. Those are mainly localized bugfixes and cleanups so I fully expect them to go in and for GCC 4.5 to rock my world.
Once GCC 4.5 ships. analyzing code will depend on a trivial matter of apt-getting(or equivalent) the hydras and specifying the analysis flags on the GCC commandline!

GCC Plugins are a Go!

January 27th, 2009

The nice folks at FSF allowed GCC have plugins. In a couple of GCC releases, Dehydra(4.5 if we are lucky) will work with distribution GCCs. Of course the API is yet to be decided on, but we have been coordinating with authors of other GCC plugin efforts to ensure that the final API meets reasonable needs.

In the future enabling static analysis checks will involve little more than specifying –with-static-checking in your Mozilla build!

JSHydra

The other breakthrough news is that Joshua Cranmer has been working on hooking up a *hydra style API to the Spidermonkey parser. This resulted in JSHydra. Ability to look into JavaScript has been sorely missing from our stack, so this is extremely exciting.

MUST_FLOW_THROUGH(”label”)

September 8th, 2008

Some time ago, Igor mentioned that there is code in SpiderMonkey that pleads to the programmer that from a certain point in a function code must flow through a label(ie a finalizer block). Treehydra made it to possible to turn that weak plea into an error message when static checking is enabled. See the bug for more details. My favourite static analyses are all about turning informal “gurantees” into angry compiler complaints.

This is my first static analysis that landed in the mozilla-central tree. It’s also the simplest one and may be a decent starting point for solving similar problems. I’d be cool to see this particular feature utilized outside of SpiderMonkey. Unlike human-powered code-inspection, it excels at finding accidental early returns covered up by macros.

Pork

I planned to release Pork 1.0 for a while now. The tools work great, even if all the love is going to the GCC-based toolchain. However, after hearing grumpy comments from a certain coworker about the uglyness of the oink build system it dawned on me that it’s rather mean to release such a mess and call it 1.0.

So I think I’ll release Pork 0.9 in the current state, so I can focus on near term GCC toolchain work. Pork in the current form means oink stack + my refactoring tools + changes to elsa and other libs to support C/C++ refactoring needs.

This will be followed up by Pork 1.0. 1.0 will involve changes to the build system to get rid of oink(we only use the oink build system and rarely use oink API). To put this another way: I don’t expect any functionality changes between 0.9 and 1.0 other than an improved build system to make it easier to get started with writing new tools.

Pork - Future

I am pretty happy with Pork as it is. I think we’ve taken Elsa as far as it’ll let us go. The only realistic improvement on the Pork side may be to have Dehydra generate a JS binding to Elsa’s extensive AST to make rewriting stuff easier. However, I’m not sure if that’s worth the effort nor that a C++ AST will reflect into JavaScript as well as GCC GIMPLE.

Preprocessing

On the other hand, something needs to be done about the main ingradient that makes Pork tick: MCPP. MCPP does a lovely job of annotating what the C preprocessor is doing, but configuring GCC to use a foreign preprocessor is a giant hassle and making sure it works correctly is troublesome. At the GCC summit, Tom gave me an idea on how similar functionality can be added to GCC directly by extending the include backtrace with macro expansions. Not only would such integration simplify Pork setup and increase Pork’s operating speed, but it is also a clean way to expose preprocessor constructs to the AST presented in De/Treehydra. It should allow for more preprocessor awareness directly in analysis stage of refactoring instead of only in the final rewriting stage as is currently done. As a side-effect, GCC would gain better error messages too.

So while this isn’t going to affect Pork directly, it will simplify the lives of Pork users while opening new analysis frontiers. Even though I hate working on preprocessor stuff, I think this work will need to happen sometime in the near future.

The Hydras

Dehydra 0.9 has been out for a while, I planned to release 1.0 soon after unless there are major flaws discovered in the API. The situation changed at the GCC summit. The fact that FSF reversed their stance on GCC plugins means that we should be concentrating on getting the plugin stuff reviewed.

So in the near term I’m forward porting the plugin stuff to trunk GCC, then I’ll be generalize the plugin API to suit at least one other GCC plugin user that we met with at the summit. The downside is that I don’t want to release Dehydra 1.0 and immediately break the plugin API. The upside is that the new API should be more general and more minimalistic and will likely be close to what will eventually become an official plugin API.

Summary: In my mind Dehydra and Pork are 1.0 quality, but I want to future-proof them a little bit before calling them 1.0.

GCC Summit

June 19th, 2008

Our presentation on Treehydra and Dehydra GCC plugins was received well at the summit.

The big news is that FSF is working on license changes to allow GPL-only GCC plugins. I’m looking forward to having our work be compatible with future GCC without any patching.

In a few minutes we’ll be having a meeting with users of other plugin frameworks to have an initial discussion on a common API. I’m working on forward porting my patches, so they can start getting reviewed ahead of license changes.

I am finally happy enough with Dehydra API and functionality to release 0.9. Dehydra is basically feature complete, the main reason I’m not calling it 1.0 is in case there are outstanding API bugs.

I believe Dehydra is the first useful open source static analysis tool. I hope to see projects outside of Mozilla benefitting from it too.

I would love to see someone package this up for various Linux distributions. You can grab there release here.

Note, this release also features as a preview release of Treehydra. Most of the development lately has been focused on improving Treehydra and building analyses on top of it.

After writing a ton of docs and working through other Dehydra 0.9 blockers, I decided to cool off by doing some actual analyses. Before I get to that, I’d like to say that the last big task is to setup a buildbot for Dehydra on Linux/OSX. Thanks to yet another awesome contribution from Vlad, that’s mostly done.

So I got working on GC-safety static analysis. Originally we tried to define a complete spec before writing a single line of code. That turned to be a bad idea and resulted in a spec full of bugs. This time we are defining the analysis incrementally and as a surprise reward, it already caught a bug.

Pushing and Popping Our Way

SpiderMonkey has a lot of complex code doing applying Push/Pop-like operations on variables in a function-local manner. Examples of functions that this analysis would look at are: JS_PUSH_TEMP_ROOT/JS_POP_TEMP_ROOT and JS_LOCK/JS_UNLOCK. See bug for more. Essentially, this will help with “code must flow through here” comments on “out:” goto labels that inhabit the SpiderMonkey source.

This is an example of control-flow-sensitive analysis. It impossible without a compiler-like view of the code that Treehydra provides. It also helps to have a scalable algorithm to iterate the CFG. Luckily, David Mandelin wrote such a beast by implementing ESP for his outparam analysis. David factored-out the ESP analysis and made it available for reuse. See esp_lock.js in the test suite for an example of how to write control-flow sensitive analyses. locks_valid*.cc and locks_bad*.cc illustrate the code patterns that can be scanned for.

So if you know of any further push/pop patterns in the rest of Moz that can be checked in this manner, leave a comment.

PS. This is yet another account of Treehydra rocking the static analysis world. Exposing the slightly scary, but awesome GCC gutts via JavaScript allows one to perform precise static analyses in a civilized manner. What could be more fun?