Porky.py (pronounced “porky pie”) is a simple C++ rewriting tool built on top of pork. Porky.py aims to make pork usable for a larger class of code rewriting problems by lowering pork’s high learning curve and making it easier to code up rewrite passes.
If you want to skip the exposition and play with the code, it’s available here. Just follow the pork install instructions to get started.
Background
Porky.py started back in April when I wanted to rewrite a bunch of code. I was (well, am still, sigh) replacing a C API with a new C++ API. Basically, code that looked like this
PRLock* lock = PR_NewLock(); PR_Lock(lock); PR_Unlock(lock); PR_DestroyLock(lock);
was going to be changed into this
Mutex* lock = new Mutex(); lock->Lock(); lock->Unlock(); delete lock;
I quickly estimated that these old APIs were used in O(1000) places in our code, which was way more than I wanted to edit by hand. (I’m lazy, so sue me.) So I wanted an automated tool. However, this rewrite task is just beyond the reach of regular-expression-based tools like sed; existing code could do something like PR_Lock(GetStruct().GetPRLock()), which causes sed to barf. Of course there’s the pork tool, which can eat this kind of rewrite for breakfast, but looking at existing pork tools convinced me that, for this relatively simple rewrite, it was going to be a waste of work to (i) learn pork’s AST classes; (ii) learn its AST visitor idioms; (iii) learn the non-standard utility libraries pork depends on (sm::string, FakeList, …); (iv) learn pork’s patch generation library; (v) code the tool in C++ … sigh.
So I was stuck with either doing a half-assed rewrite with sed and spending a few days fixing up its mistakes, or wasting a week or so coding a pork tool. Half-assed rewrites suck. But in the bug report I filed, I realized that I was naturally describing the rewrite in a way that an automated tool could understand. I have some background in programming languages, so in the spirit of Terence Parr
“Why program by hand in five days what you can spend five
yearsdays of your life automating?”
I decided to write my own tool on top of pork.
Porky.py’s specification language
Porky.py’s domain-specific language (DSL) for rewrites was designed with this use case in mind
- a C++ developer not familiar with porky.py wants to do something like my API rewrite example above
- doesn’t want to spend several days learning pork
- and doesn’t want to learn an obscure DSL
- (it’d be nice if the tool were fast, too)
So these requirements to me implied that, first, the porky.py DSL should be “minimal” in the sense of minimal additional syntax beyond C++’s — less to learn. And second, porky.py should target expression-level rewrites (as API changes usually are) rather than statement level. Statement-level rewrites complicate things.
Below is a working porky.py solution to the rewrite problem posed above. You can decide for yourself whether it meets my criteria.
rewrite SyncPrimitiveUpgrade {
type PRLock* => Mutex*
call PR_NewLock() => new Mutex()
call PR_Lock(lock) => lock->Lock()
call PR_Unlock(lock) => lock->Unlock()
call PR_DestroyLock(lock) => delete lock
}
The rewrite rule type PRLock* => Mutex* means: everywhere the type “PRLock*” appears, change it into “Mutex*”. The second kind of rule here, PR_Lock(lock) => lock->Lock() is more interesting; it means that, at any callsite matching
PR_Lock($lock$)
where $lock$ is any expression, change this line into
$lock$->Lock()
These kinds of rules are porky.py’s big advantage over sed et al.: because pork.py has access to a C++ AST through pork, it can match patterns that require strictly more power than regular expressions provide. One can write rules like call Foo(a, b, c) => c.Method(b, a), and the rule will rewrite call sites like Foo(x().y().z(), r(s(t)), u.v.w()) into u.v.w().Method(r(s(t)), x().y().z()).
And finally, porky.py provides the creature comfort of one-liner shell invocations for really simple rewrites
porkyc -e 'call SomeFun(a, b) => OtherFun(b, a)'
After which the compiled pork tool can be invoked. (Docs forthcoming on MDC.)
Code rewriting workflow when using porky.py
After writing porky.py, I used it to edit a large quantity of code in a couple of hours. These patches haven’t all made it into mozilla-central yet (for a variety of reasons), but I wanted to show the steps I took to generate them. This will eventually find its way into an MDC guide.
$ porkyc -m sync_primitive_upgrade.porky (outputs and compiles code in |SyncPrimitiveUpgrade.code/|) $ SyncPrimitiveUpgrade.code/dorewrite ~/mozilla-code/*.ii -x *nspr* > mozilla-code.patch (does n-way parallel rewrite on matching files; n depends on your system) (doesn't include files matching the pattern *nspr* in the patch) (writes patch to stdout)
Next, I would apply this patch and compile, fixing up problems by hand (hey, porky.py is a prototype). Then it was hg qdiff and the patch was up for review. I could generate these patches much faster than they would have been able to be reviewed; this ended up being the bottleneck.
Eventual goal for porky.py
I’d like it to support this rewrite
rewrite FooToBar {
class Foo => Bar {
member mMember => member_
method Method(a1, a2) => method(a2, a1)
}
}
which would entail
- rename
class Foointoclass Bar - rename type “Foo” into type “Bar” (including
Foo*,Foo&, …) - change calls to Foo constructors into Bar constructors
- rename declaration of
Foo.mMemberintoBar.member_ - convert accesses of
((Foo)inst).mMemberinto((Bar)inst).member_, (and similary forinst->mMember, …) - rename declaration of
Foo::MethodintoBar::method - rename implementation of
Foo::MethodintoBar::method - convert calls to
((Foo)inst).Method(a1, a2)into((Bar)inst).method(a2, a1)(and similary forinst->Method(a1, a2), … and similary for subclasses of Foo …)
I should note that having porky.py rewrite declarations and definitions is not so important (though it would be nice!): there is only one declaration/definition. Rewriting uses is much more important, since there are any number of uses.
I won’t implement this kind of rewrite until I need it. Sorry! But please feel free to dive in to the porky.py code and do it yourself!
How porky.py fits into the “rewrite tool space”
Rewrite tools have to trade off several factors. It’s good to have a small, familiar DSL, as these are easier to learn and remember. But it’s also good to have a large and expressive DSL, for raw rewrite power. The table below is my attempt to fit porky.py into the space of relevant rewrite approaches I’m aware of. It compares porky.py with
- pork. Rewrites specs are written in C++, which is not obscure to C++ programmers. Rewrite specs are very verbose. Any possible rewrite of C++ code can be expressed in pork. Pork is very fast.
- XML+XSLT. Rewrite specs are written in XML/XSLT modeled on a particular C++ AST; very obscure. Rewrite specs are relatively concise. Any possible rewrite can be expressed. Very slow.
- Tree transformation (e.g. in ANTLR). Very obscure DSL. Rewrite specs are relatively concise. Can express (usually) any possible rewrite. Usually relatively slow.
- Coccinelle/SmPl. DSL relatively familiar. Rewrite specs concise. Can express most statement-level rewrites. Can be fast.
- porky.py. DSL familiar. Rewrite specs concise. Express some expression-level rewrites. Fast.
- sed. DSL familiar. Specs concise. Extremely limited power. Very fast.
<---- More LoC(bad) ----- Fewer LoC (good) --->
<-- Less obscure(good) -- More obscure(bad) -->
No DSL | Some DSL | More DSL | All DSL
+---------+------------+------------+-----------
All possible | Pork | ??? | ??? | XML+XSLT/tree trans.
~Statement | | |coccinelle* |
~Expression | | porky.py | |
Names | | | |
Crappy rename hacks | sed** | | |
-------------
* only works for C code
** assumption: regular expressions are well-known enough that
negative "DSL" connontations don't apply.
To be honest, the biggest lesson I learned from this project is that pork can be a good lower-level “engine” for higher-level tools. I didn’t know about Coccinelle when I wrote porky.py; if I had, I might have tried retargeting it to pork instead of starting from scratch. I prefer some of porky’s syntax/semantics to Coccinelle’s, but I think the additional complexity of SmPL adds a compelling amount of power over porky.py’s simpler DSL. Upgrading SmPL to parse C++ and retargeting it to pork might be a good project for someone else.
But of course, since Coccinelle doesn’t understand C++, we’re “stuck” with porky.py for the foreseeable future
.
Porky.py for language nerds
Porky.py is implemented as a relatively simple source-to-source translator written in Python. It converts porky.py specifications into a C++ header containing rewrite rules defined in a sort of “bytecode.” This header is included by a general porky.py C++ tool that uses pork. This tool “interprets” the rules, and if a rule matches part of the C++ AST, the tool generates a patch hunk according to the porky.py spec. This is fairly similar to how a regular expression engine would implement a “replace” function, although the matching is obviously fairly different.
There were a few interesting problems that arose while I designed the porky.py language. The first was what the semantics of rule matching should be. The issue is that multiple rules can match the same program text. For example, in the spec
call Foo => Bar (means "rewrite all calls to Foo into Bar, regardless of arguments") call Foo(a, b) => Bar(b, a) (means "rewrite only the two-argument version of Foo into Bar, reverse the args")
both rules will match Foo(1, 2). Which should be used?. My solution was use the most “specific” rule. I defined what “specific” meant by the following rules. First, a call pattern with arguments is “more specific than” a call pattern without arguments. And second, a “literal” pattern is “more specific than” a wildcard pattern. For example, Foo::kSomething is more specific than f. Porky.py implements these heuristics by ordering all the rewrite rules at compile time by decreasing “specificity” and then attempting matches in that order. The first rule to match is chosen for the rewrite.
A second issue that arose was how rewrite specifications should express “literal” patterns — i.e., match this exact text — vs. “wildcard” patterns — i.e., match any expression in this syntactic slot. The problem arises in this rule call Foo(Bar, a) => Baz(a, Bar). Are “Foo”, “a”, and “Bar” literals or wildcards? I didn’t want to add special syntax for wildcards because of the “minimal DSL” design goal. So, my solution was to be as “greedy” as possible about choosing wildcard variables; I think this is likely to be least surprising. In the example above, “Foo” shouldn’t be a wildcard because if it were, it would match any function call with two arguments. That would be silly. But, both “a” and “Bar” are wildcards; in fact, any identifier used as a C++ “expression variable” (other than function names) are treated as wildcards. The “escape hatch” is C++ namespace qualification; if in the first example “Bar” was meant to only match some global symbol “Bar”, then the pattern could have been written as call Foo(::Bar, a). There are numerous other heuristics possible here, and I may change porky.py’s depending on feedback.
A third issue was how porky.py expressions should be typed, i.e., what their C++ type should be. For example, this porky.py rule seems innocent enough call foo->Bar() => foo->Baz() when one is thinking “foo is a Foo instance,” but how can porky.py glean that information? (Note that this is not a problem for the RHS “foo”, since its type is already known by the time porky.py is ready to generate a patch hunk.) My solution to this was to have porky.py programmers write these LHS method calls in desugared form; in the example above, call Foo::Bar(foo) => foo->Baz(). Eventually, though, I would like to use the class syntax I introduced above to resolve this ambiguity. For example: class Foo { call Bar => call Baz } (note, though, that this isn’t implemented yet). C++ inheritance also complicates things here. When implementing porky.py I punted on inheritance (hey, it was five day’s work!), but I think the problems it presents have reasonable solutions.
Hacking porky.py
If you’re interested in taking up any of the porky.py extensions I suggested here, or just want some porky.py technical support, send me an e-mail at cjones@mozilla.com or drop by the #static channel at irc.mozilla.org.
