Pork Rewrite Guide

July 2nd, 2009

Big Picture

A pork rewrite consists of the following steps:

  1. Produce annotated .i/.ii files for pork consumption. This is easiest to achieve by hooking mcpp into gcc and passing “-Wl,-K,-W0 -savetemps” to GCC while compiling your project.
  2. Write a pork application to do the rewrite that outputs a patch.
  3. run the work app via pork-barrel to make use of multiple CPUs and to combine resulting patches

Why use preprocessed files, why use MCPP?

Pork relies on .ii (preprocessed C++, .i for C) files instead of on C++ source directly because it’s convenient. Using the preprocessor directives in the .ii files Elsa maps AST node locations back into the original source. However, due to macro expansion, .ii files as produced by a typical processor make it impossible to deduce the original positions when macros are used. For example:

function_call(NULL, “second_arg”);

In above example NULL expands to 0 which causes “second_arg” to be moved three characters to the left.

Vertical displacement happens due to something like the following:

#define FUNC_LIKE(x,y) x;y;

FUNC_LIKE(function_call(NULL, “second_arg”),
another_call()); yet_another_func_call();

In the above example another_call() and yet_another_func_call() will be moved vertically and horizontally.

MCPP’s -K flag helps address problems caused by macro expansion by annotating every expansion with pre-expansion coordinates such that during parsing Elsa can reconstruct original positions.

Why Patches as Output

One way to do source transformations is via pretty-printing: read whole source in and produce a new file by printing out the modified AST. Unfortunately that does not work in practice because it discards all code comments, indentation and the complexity of the C++ language makes it hard to roundtrip ASTs.

Thus the only practical way to automatically rewrite C++ files is by modifying as little code as possible. Another question is why produce a patch instead of modifying files directly. The reason for that is a software engineering one and is left as an exercise for the reader.

Writing a Pork Rewriter

There are a few key concepts in the source code.

  1. Patcher: class that turns instructions to rewrite a particular region into a patch
  2. SourceLoc vs CPPSourceLoc: SourceLoc is a representation of locations in Elsa.
    All AST nodes have a SourceLoc loc member and most have an endloc one. These correspond to the beginning and end of an AST node as presented by the .ii file BEFORE taking MCPP annotations into consideration. These positions are meant to be translated by constructing a CPPSourceLoc object which translates them into the exact positions in original source(either that or the AST node is within a macro expansion and can not be rewritten automatically).
  3. Elsa is structured in an objected-oriented style, thus best way to find an interesting AST node to rewrite is via the visitor interface. ExpressionVisitor is a frequently used visitor because it visits all statements and expressions.

Use stopwatch.cc in the pork/ directory for reference. First step is to obtain an AST:

TranslationUnit *unit = parser.getASTNoExc(”/path/to/preprocessed.ii”);

All of Elsa’s AST nodes have a ->traverse() method to find an interesting node. Stopwatch looks for function bodies so it overrides the visitFunction method in the visitor.  There is a variety of Elsa AST nodes to choose from: cc.ast.gen.h lists them all. In order to determine how a particular pattern in the source is represented in the AST, it is often useful to invoke one of the print methods to examine the structure(print, debugPrint, etc). Every AST node has those print methods among other common methods.

As was mentioned before actual rewriting is carried out by the Patcher class. To rewrite code obtain the SourceLocs needed, translate them with CPPSourceLoc.

For example, to rewrite all return statements do:

SomeVisitor::visitS_return(S_return *s) {

CPPSourceLoc csl(s->loc);

CPPSourceLoc endcls(s->endloc);

PairLoc pair(csl, endcsl);

if (pair.hasExactPosition()) {

patcher.printPatch(”/* a return was here*/”, pair)

} else if (MacroUndoEntry *macro = pair.getMacro()) {

cerr << toString(macro->preStartLoc) << macro->name << ” macro got in the way”;

}

return false;// do not recurse further

}

stopwatch.cpp example use Patcher::insertBefore which is convenient for inserting code.

Leave a Reply