I have never enjoyed the theory behind software engineering. It seems particularly depressing as it can be summarized as: “What can we learn from past software development experience in order to not repeat old mistakes such that we can come up with newer and shinier mistakes?”.

For that reason I haven’t been able to stick to any particular software development doctrine (paired, test-driven, OO, SOA, etc) and instead taken shortcuts to whatever is practical at the time.

One unfortunate result of such neglect is that the oink test suite ended up not being utilized. I tried it a couple of times while starting out with oink and it failed in many cumbersome ways. However, as pork evolved out of oink, I learned more about the “architecture” behind it, I fixed a couple of the issues that were causing funny make errors.

However one giant bug remained. Turned out other people were able to run the original oink testsuite, but not the equivalent one in pork. Fearing that I somehow screwed up Elsa, I spent way too long investigating the failure only to learn it wasn’t my fault. Pork users: rejoice, the testsuite should run as expected now.

PS. I may not be a SENG believer, but I do think that open source + good version control + testsuites result in better software.

Summit

August 4th, 2008

The past week rocked. I was especially impressed with the localizers. The guys who bear through translating an entire browser with associated websites to expose their country to an awesome browsing experience are simply electrifying.

It was great to see the South American guys again and to meet hordes of Europeans.

The most exciting outcome of the summit in my neck of the static analysis woods is that DXR (a semantically aware successor to MXR) will be rewritten from scratch in Python. Another reason to rejoice is that a tracing spidermonkey should make Treehydra ridiculously fast without much (or any) effort on my part.

Pull pork with care

July 25th, 2008

I just committed the large giant change to bring down elsa’s namespace pollution to reasonable levels. Elsa code now is now using std::foo style, or using namespace std. As I mentioned before, Elsa’s string is now sm::string, a summary of how to perform similar renames is here. The good news is that Pork will soon work out of the box with a modern toolchain.

For the handful of porkers out there, you need to hg pull & hg up all of the pork repositories. This has been a use-case in why splitting up a codebase into a billion repositories is a bad idea:

a) Lovely, I have to do many commits instead of one

b) To top it off, now my users will curse my name while updating whatever pork repository that interests them most.

I feel like I’m going to throw up if I see any more C++ code diffs in the next 10minutes.

In contrast, while rewriting things on the Mozilla-scale is a lot less feasible manually, it is very rewarding to automate. Gotta love big C++ codebases.

Dogfooding pork & OSCON

July 22nd, 2008

I wrote a class renamer and used it to fix my pork pet-pieve #1: a class named string that isn’t std::string. This has been a low priority goal for as long as I’ve been using Elsa. It’s pretty cool to apply a tool to fix itself.

The renamer is a 3x simpler than the next simplest tool. I plan to extend it to also rename class members. Renaming is the most trivial use-case for rewriting code, I plan to post a tutorial on usingĀ  the renamer in the near future.

OSCON

If you are at OSCON, you do not want to miss our static analysis session on Wednesday.

It seems that there is some confusion as to what pork is and how it’s related to oink and elsa. So here is my view of it.

Pork is my set of tools that use Elsa to rewrite sourcecode (mainly Mozilla code). Our use of Pork is solely for rewriting as it is not suited for convenient and hardcore analysis needs as much as the GCC based tools are.

MCPP is the secret sauce C preprocessor that makes C++ rewriting with Elsa possible by annotating preprocessed files with information to undo the lexical braindamage resulting from macro expansion.

Elsa is a awesome C++ parser. Awesome in that is can preserve more information regarding parsed code than any other C/C++ parser and it is easy to extend.

We maintain our own version of Elsa within pork.

I think our version of Elsa is the most up to date and most compatible with newer C++ features and headers used by newer GCC releases. We encourage other projects with C++ parsing/rewriting needs to collaborate with us. We will be parsing code with Elsa for a few years to come and it’s a lot of work to maintain a C++ parser by a single entity. I think elsa is a much better backend to build refactoring support onto than any other C++ parsing project out there right now.

The Messy Details

Now lets move on the more confusing parts: oink, oink-stack, and the oink mailing list.

oink consists of some static analysis tools and was meant to be a central place where all of the Elsa and Elsa-related development was supposed to happen. When people refer to oink, they usually mean the oink-stack which is a subversion meta repository that pulls in a dozen of subrepositoes(smbase, elkhound, elsa, oink(where static analysis tools live), etc).

So when I started working on refactoring tools I was told that I should aim to have my tools added to oink, but there were some legal hassles to work out in the meantime so I cloned the oink-stack and developed my tools with minimal changes to oink-stack. This included various elsa extensions, bugfixes, etc.

However, the little momentum that oink had has fizzled out due to various personality conflicts and various academics loosing interest. The code has been bitrotting for as long as I’ve been working at Mozilla.

So the end result of oink is that we have pork which is a superset of oink. I’m not even sure if I mention the name pork anywhere in the sources. So pork at the moment means “Taras’ continuation and extension of oink”. I am using the oink mailing list for any discussion on changes to Elsa/etc in hopes that at least some of the genius lurkers there will regain their interest in elsa.

Where do We Go From Here?

Onward! Due to the original authors vision of what C++ is and the state of C++ at the time Elsa was conceived, current pork code causes people to have many WTF moments (followed by banging head against keyboard) when they first start using it.

The short version of my plan is:

  • allow one to do “using namespace std” when using elsa
  • Restructure pork repositories such that there are only 3 of them rather than 11 (elsa, elkhound, pork)
  • get rid of the oink repository (those tools do not work for us)
  • Make pork only consist of just my tools (with a sane build system) rather than be mixed into unmainted oink stuff
  • Make pork compile with new compilers (GCC 4.3 and recent MSVC++)
  • Keep track of this in a bug
  • Clean up various misc things

Some of you might ask “But Taras, why now, why not just keep doing what you’ve been doing?”. I was doing what I was doing because I had an overwhelming goal of devising a way to automate static analysis and refactoring of Mozilla on my shoulders and I wasn’t convinced that it was feasible. I had to learn to split my time between tool development and actually using the tools. Naturally I cut corners on tool development :)

Since then slowly, but surely various awesome hackers have started doing rewrites and analyses themselves freeing me up to focus more on development. To make matters sweeter, various hackers have started submitting bugreports, fixes, ports to my tools. This gives me more time to focus on the big picture.

Finally, I belive that automation of the sort we are doing at Mozilla is something that has been missing from open source development practices and it will catch on once people realize what they’ve been missing. Reducing those WTF moments will help people think positively.

Read the rest of this entry »

Hydras

I am close to landing a flow check. Turns out, it is super-easy to introduce new analyses into Mozilla due to a very nice build system hooks setup by bsmedberg.

Since coming back from the GCC summit I have forward-ported our GCC patches to GCC trunk. The FSF legal paperwork came through today so I posted the first and biggest patch to the GCC for review.

I am not sure if I mentioned this before, but the C port of Dehydra is somewhat operational. It doesn’t yet have access to function bodies, but type traversal should work. Unfortunately, the C frontend has less features(pretty printing sucks, locations are even less reliable, etc) and thus is less awesome to work with than the C++ frontend.

Pork

jst was awesome enough to list some interfaces that need some outparamdelling. The list is here (in the content/ section). This lead me to spent some time making outparamdel’s output prettier. There are still some improvements to be made, and I will be making them in the near future. However if someone is interested in refactoring of this kind land in the near future, they could easily complete outparamdel’s work with some clever scripting and a bit of manual labour. Sure beats doing the entire thing manually. From outparamdel’s perspective last 10% appear to be slightly painful and might take some time.

Here is a patch that takes about 30seconds to produce.

Another exciting aspect of this is that a certain emacs wizard has confirmed that it would be possible to feed emacs such a patch file and have it correct indentation for the affected areas only.

I am also very excited that a certain volunteer came forward and decided to start improving some of the stomach-turning areas of Pork. Hopefully in the near future we’ll modernize the C++ a little bit and a user’s first reaction wont be: “What the hell, why can’t I do ‘using namespace std;’”.

To this end I have filed a bug to write a renamer tool so we can dogfood renaming of unfortunately named pieces of code.

OSCON

The plan is to have some sort of a minisession on our static analysis efforts at Mozilla. So if you are attending OSCON and are interested in doing exciting things to depressingly large amounts of code, drop me a line.

Dear lazyweb,

Please explain to me why the following code works the way it does. From looking at the following code and stringstream::str(), stringstream::str(string) docs the behavior of the following code does not make sense to me.

#include <sstream>
#include <iostream>

using namespace std;

int main(int argc, char**) {
stringstream ss(”foo”);
cout << ss.str() << endl;
ss << “bar”;
cout << ss.str() << endl;
ss << “more”;
cout << ss.str() << endl;
}

Read the rest of this entry »

Pork 0.9 in the wild

June 30th, 2008

Those who would like to play with Pork, but are allergic to pulling sources from version control can now download an actual pork release. Now someone needs to hook this into a GUI to provide easy Eclipse-style refactoring for C++.

Pork

I planned to release Pork 1.0 for a while now. The tools work great, even if all the love is going to the GCC-based toolchain. However, after hearing grumpy comments from a certain coworker about the uglyness of the oink build system it dawned on me that it’s rather mean to release such a mess and call it 1.0.

So I think I’ll release Pork 0.9 in the current state, so I can focus on near term GCC toolchain work. Pork in the current form means oink stack + my refactoring tools + changes to elsa and other libs to support C/C++ refactoring needs.

This will be followed up by Pork 1.0. 1.0 will involve changes to the build system to get rid of oink(we only use the oink build system and rarely use oink API). To put this another way: I don’t expect any functionality changes between 0.9 and 1.0 other than an improved build system to make it easier to get started with writing new tools.

Pork - Future

I am pretty happy with Pork as it is. I think we’ve taken Elsa as far as it’ll let us go. The only realistic improvement on the Pork side may be to have Dehydra generate a JS binding to Elsa’s extensive AST to make rewriting stuff easier. However, I’m not sure if that’s worth the effort nor that a C++ AST will reflect into JavaScript as well as GCC GIMPLE.

Preprocessing

On the other hand, something needs to be done about the main ingradient that makes Pork tick: MCPP. MCPP does a lovely job of annotating what the C preprocessor is doing, but configuring GCC to use a foreign preprocessor is a giant hassle and making sure it works correctly is troublesome. At the GCC summit, Tom gave me an idea on how similar functionality can be added to GCC directly by extending the include backtrace with macro expansions. Not only would such integration simplify Pork setup and increase Pork’s operating speed, but it is also a clean way to expose preprocessor constructs to the AST presented in De/Treehydra. It should allow for more preprocessor awareness directly in analysis stage of refactoring instead of only in the final rewriting stage as is currently done. As a side-effect, GCC would gain better error messages too.

So while this isn’t going to affect Pork directly, it will simplify the lives of Pork users while opening new analysis frontiers. Even though I hate working on preprocessor stuff, I think this work will need to happen sometime in the near future.

The Hydras

Dehydra 0.9 has been out for a while, I planned to release 1.0 soon after unless there are major flaws discovered in the API. The situation changed at the GCC summit. The fact that FSF reversed their stance on GCC plugins means that we should be concentrating on getting the plugin stuff reviewed.

So in the near term I’m forward porting the plugin stuff to trunk GCC, then I’ll be generalize the plugin API to suit at least one other GCC plugin user that we met with at the summit. The downside is that I don’t want to release Dehydra 1.0 and immediately break the plugin API. The upside is that the new API should be more general and more minimalistic and will likely be close to what will eventually become an official plugin API.

Summary: In my mind Dehydra and Pork are 1.0 quality, but I want to future-proof them a little bit before calling them 1.0.

GCC Summit

June 19th, 2008

Our presentation on Treehydra and Dehydra GCC plugins was received well at the summit.

The big news is that FSF is working on license changes to allow GPL-only GCC plugins. I’m looking forward to having our work be compatible with future GCC without any patching.

In a few minutes we’ll be having a meeting with users of other plugin frameworks to have an initial discussion on a common API. I’m working on forward porting my patches, so they can start getting reviewed ahead of license changes.