Main menu:

Site search

Categories

Archive

ESP: MSR’s little helper

The Javascript/Treehydra version of the outparam usage checker is finally nearing completion: all that’s left is packaging it as a patch that can go into mozilla-central (plus the inevitable future debugging). In my last post, I mentioned that the checker is based on ESP, an program analysis technique invented at Microsoft Research. A few people have asked for a post about ESP (the paper is good, but very dense if you don’t have a PL research background), so here it is.

Why ESP?
First I should explain why I bothered implementing a new outparam checker design given that I had a working version based on theorem proving. The problem was that that the theorem-proving version worked by analyzing “every” path in each method. Or it would have worked if it could analyze every path. But a method with N if statements can have 2^N paths, and N gets big enough that Mozilla has a method with 8 million paths. Worse, methods with loops have an infinite number of paths. In practice, path-based analyses have to give up after about 1000 paths, leaving the rest unanalyzed.

In short, path-based analysis is very precise, but lacks coverage of all the code paths. Conversely, the abstract interpretation approach I showed in my previous post does cover all code paths, but it mixes them up so much that it ends up being too imprecise to work at all.

When I saw this problem, I remembered ESP right away, because the whole point of ESP is to get the precision of path-based analysis with the speed and coverage of abstract interpretation. But after reviewing the paper, I couldn’t really see how to make ESP solve the problems I described before, so I went the theorem proving route. But once I got stuck on the path explosion problem, I went back to it, and eventually it hit me. Now it seems kind of obvious. So, it seems like I should be able to explain ESP and its application to outparams in a way that makes it sound simple, but that turned out to be hard. Hopefully it’s at least comprehensible.

Abstract Interpetation Redux.
Previously, I tried out abstract interpretation with pen and paper and found that it didn’t even come close to working for outparams. (Reminder: abstract interpretation means running the code in a special interpreter that (a) tracks finite(-ish) abstract states instead of the standard program state, (b) goes both ways at branches and (c) merges state when control rejoins. This has the effect of running the method on every possible input value and every path in finite time. The price is that the output is abstract states instead of full detail.) Here are the results again (the table on the right shows the abstract state after abstractly interpreting each statement):

 1   nsresult SomeMethod(nsIX **out) {      out       rv   tmp   if.temp
 2     nsresult rv = doSomething();      not-written   ?
 3     tmp = rv;                         not-written   ?    ?
 4     if.temp = NS_SUCCEEDED(tmp)       not-written   ?    ?      ?
 5     if (if.temp) {                    not-written   ?    ?    true
 6       out = mValue;                       written   ?    ?    true
 7       return NS_OK;                       written   ?    ?    true
 8     } else {                          not-written   ?    ?    false
 9       return rv;                      not-written   ?    ?    false
10     }
11   }

These analysis results are too imprecise to check the return on line 9: rv is unknown, so the analysis has to assume that the return value could be success, which is an error because out has not been written at this point. Note that the abstract interpretation never had any information about rv. Clearly, total ingorance about rv just won’t work, and any algorithm that works must track the relationship between out and rv that is created by line 2.

A Smarter Abstract State Space.
Abstract interpetation can track that relationship, but it needs to use a more complicated abstract state than the one I implicitly used above. The abstract state in my table above is a mapping of variables to abstract values. (Compare with the real program state, which is a mapping of variable to C++ values.) That’s the simplest and most common abstract state, but there’s really nothing special about it. An abstract state can be any representation of a set of program states: the game is to choose an abstract state space that is “fine” enough to represent the information we need, but no finer, so the abstract states stay small and simple.

We need a state space that can represent facts like “if.temp is true iff tmp is a success code”. I can write that more explictly as, “if.temp is true and tmp is a success code, or if.temp is false and tmp is a failure code.” And that looks just like the “or” of two mappings of variables to abstract values. So, it looks like we can use an abstract state that’s just like our original state, except allowing multiple “table rows”. If we code the abstract interpreter to use multiple rows when it can, the results of abstract interpretation will come out like this (showing the states between the statements so it’s easier to separate the rows):

 1   nsresult SomeMethod(nsIX **out) {      out         rv    tmp   if.temp
                                         not-written
 2     nsresult rv = doSomething();
                                         not-written   succ
                                         not-written   fail
 3     tmp = rv;
                                         not-written   succ  succ
                                         not-written   fail  fail
 4     if.temp = NS_SUCCEEDED(tmp)
                                         not-written   succ  succ    true
                                         not-written   fail  fail    false
 5     if (if.temp) {
                                         not-written   succ  succ    true
 6       out = mValue;
                                             written   succ  succ    true
 7       return NS_OK;
 8     } else {
                                         not-written   fail  fail    false
 9       return rv;
10     }
11   }

These results are detailed enough to check outparams perfectly!

A few things to note: In abstractly interpreting line 2, we don’t know the results exactly, but instead of generating a lot of “unknown” abstract values, we generate multiple rows, establishing the correlation among results. Now on lines 3 and 4, we have a multiple-row state, so we abstractly interpret the statements on each row independently. Finally, line 5 is a conditional guard, so at that point, we filter out all the rows that don’t match the guard (because the program wouldn’t execute this path in those states). Each of these features is another detail that has to be noticed and coded up in the analysis, but they all fit naturally into the framework of interpreting statements on abstract states.

Path Sensitivity.
This version of the analysis is actually path-sensitive, because if different paths generate different states, those states will be kept as separate rows. Here’s an example:

nsresult OtherMethod(nsIX **out1, nsIX **out2) {
                                        out1          out2         rv    if.temp
                                    not-written   not-written
  nsresult rv = doSomething();
                                    not-written   not-written   success
                                    not-written   not-written   failure
  if.temp = NS_SUCCEEDED(rv);
                                    not-written   not-written   success   true
                                    not-written   not-written   failure   false
  if (if.temp) {
                                    not-written   not-written   success   true
    out1 = mFoo;
                                A:      written   not-written   success   true
  } else {
                                B:  not-written   not-written   failure   false
  }
                                C:  // Join point -- state is union of A and B.
                                        written   not-written   success   true
                                    not-written   not-written   failure   false
  doMoreStuff();
                                        written   not-written   success   true
                                    not-written   not-written   failure   false
  if (if.temp) {
                                        written   not-written   success   true
    out2 = mBar;
                                        written       written   success   true
  } else {
                                    not-written   not-written   failure   false
  }
                                     // Join point
                                        written       written   success   true
                                    not-written   not-written   failure   false
  return rv;
}

It’s kind of hard to read, but the key point is that there are two ifs with the same guard, and to analyze the method correctly, we need to know that of the 4 possible paths, only 2 can actually be taken. State C is the important one: after finishing the first if, at the join point we merge the states by simply collecting all the rows. Each path has a different row, and the rows stay separate, so on the second if, the analysis executes the then branch only in the states generated by the first then branch.

This is actually the kind of thing the ESP authors were most concerned with in their paper. It’s pretty neat but the problems I had look very different, which is why it took me so long to see the connection.

A nice thing about this kind of path sensitivity is that if the state is the same along two branches, the rows will “rejoin” at the join point, essentially forgetting that there was a branch (because it didn’t really matter anyway). It also works with loops.

The problem is that although we don’t exactly get path explosion anymore, we can get “row explosion”: if there are M variables, and each has 2 possible abstract values, we can get 2^M rows in the state. And M can easily get big enough in Mozilla to run out of memory.

ESP.
This is where ESP comes into play. The insight of ESP is that there are some variables you care about a lot (which the ESP authors call property variables), and others you care about only as far as they relate to the property variables (which the ESP authors call execution variables). (For example, in outparams, the property variables are the outparams and any variables that whose values can reach a return statement.) So, if there are only a few property variables, then if we had a way to track only the property values path-sensitively, we can be precise on the things we care about without row explosion.

ESP does this very simply: it just takes our multiple-row states and adds a primary key, namely the set of property variables. Thus, property value combinations and relations are always tracked precisely. Execution variables are tracked as one mapping per property value combination, just as in the basic abstract interpretation. Because of primary key uniqueness, if there are K property variables, there can be no more than 2^K rows in a state, so if K is smaller than 10 or so, the states are small enough to analyze in reasonable time.

An ESP analysis looks a lot like our path-sensitive abstract intepretation, except that after each operation, it “collects” rows together to maintain the primary key uniqueness property. For example, if P is a property variable and E is an execution variable, and we need to merge this state:

    P = true,    E = false
    P = false,   E = false

with this state:

    P = true,    E = true

we take the union of rows as before to get this:

    P = true,    E = false
    P = false,   E = false
    P = true,    E = true

but then we merge together rows with the same primary key, yielding:

    P = true,    E = anything
    P = false,   E = false

The significance of ESP is for outparams is that all Mozilla methods have only a few outparams and return value variables, so the analysis runs fast no matter how many other “unimportant” variables are in the method.

A small tweak.
Actually, that’s not quite true. GCC generates a temporary variable for each return statement, so if there are 30 return statements, there are 30 temporary variables, and the state can grow to 2^30 rows. That does happen, and it does make the analysis run out of memory.
Fortunately, I was able to fix this with a just a small tweak to ESP. The temporary variables are only “live” between the point where they are created and where they are copied to another return variable, and their values don’t matter at all outside that live range. At any given point in the method, only a few temporaries are live. So I can keep the number of property values small by “demoting” return values to execution values once they are dead. And demotion is trivial to implement: just set the abstract value to any one value, because we’ll never read it anyway.

The whole outparam analysis came out to about 2500 lines of Javascript, but a lot of that was adapter code to simplify the Treehydra API, plus subsidiary analyses to find return value variables and their live ranges. The ESP framework was 450 lines, and the outparam abstract interpreter was another 800 lines. It runs in reasonable time too, without any effort optimizing it yet. I haven’t measured it exactly, but I think it’s less than 20 minutes on 1970 C++ files of Mozilla on a 4-processor machine. I guess you wouldn’t want to run it on every build, but if you’re only changing a few .cpp files, it shouldn’t be too bad.

Making Treehydra do useful tricks

Taras’ last blog post ended with a comment about “making [Treehydra] do useful tricks”, which oddly enough, is exactly what I’ve been working on, and I’ve finally made enough progress to blog about it. I’ve been alternating between implementing a Treehydra Javascript analysis library and adding needed features to Treehydra.

Just today, I managed to do an intraprocedural live variable analysis, which is one of the simplest program analyses, on every file in mozilla-central. (Live variable analysis determines the set of variables that may be read in the future at every point in a function. It’s commonly used in optimization to save storage for unused variables, but I use it to make checkers “forget” information about unused variables.) Here’s a visualization of the results for Firefox’s main() function in a Linux build: the set of live variables is listed at the bottom of each basic block.

It took 25-30 minutes to run on all of Mozilla (as preprocessed C++), but I know a lot of that is simply GCC compile time, and I think a fair fraction of the rest was spent generating the visualizations, which most analyses won’t do. I guess I need to investigate how to time JS execution internally.

My Javascript analysis library is about 900 lines of code, with modules for Treehydra utilities, GCC data access, GCC value printing, data structures needed for analysis, backward data-flow analysis. I hope these will be reused for other analyses–there are fewer than 100 lines of code specific to liveness analysis. The code is available here.

The next step will be to finish the outparam analysis. Hopefully, it won’t be too hard. The big pieces are:

  • An analysis to determine which variables may reach the return statement of the function (the technique is similar to the liveness analysis).
  • Port over my ESP analysis framework from Python.
  • Implement the outparam checker in the ESP framework.

I prototyped all of it in Python, so I know the algorithms work, and I’ve ported much of it over to Treehydra/JS for the liveness demo, so I know it codes up nicely there as well. I’m sure there will be glitches to fix, and I’m sure I made some mistakes in designing my Javascript framework, but I’ll just have to see how it goes.

Finally, I have to mention that I’ve upgraded my Javascript skills quite a bit in the process of doing this (it’s the most complex JS program I’ve written, and I’ve also been using JSAPI), and it’s all thanks to the MDC Javascript Guide, which has been an excellent resource.

I need a theorem prover!

Time for a hardcore static analysis post. By the way, if anyone reading this knows about theorem provers, I could really use your help: what’s a solid off-the-shelf prover that will solve my formulas (see below)?

I’m working on bug 420933, which is a request for a static checker for XPCOM outparam usage. In short, XPCOM methods with outparams must (a) not set outparams when they return a failure code, and (b) set all outparams when they return a success code. At first, checking this sounded easy, because I was thinking of code like this:

nsresult SomeMethod(nsIWhatever **out) {
  nsresult rv = doSomething();
  if (NS_SUCCEEDED(rv)) {
    out = mValue;
    return NS_OK;
  } else {
    return rv;
  }
}

Attempt #1. We can check this method with a simple static analysis algorithm. The first step is to trace through all the code paths, recording (i) any outparam that gets assigned to, and (ii) (an abstract approximation of) values of variables that can be returned from the method. In this case, the results are:

nsresult SomeMethod(nsIWhatever **out) {
  nsresult rv = doSomething();   // rv = ?,       out = NOT WRITTEN
  if (NS_SUCCEEDED(rv)) {
                                 // rv = SUCCESS, out = NOT_WRITTEN
    out = mValue;                // rv = SUCCESS, out = WRITTEN
    return NS_OK;                // rv = SUCCESS, out = WRITTEN
  } else {
                                 // rv = FAILURE, out = NOT_WRITTEN
    return rv;                   // rv = FAILURE, out = NOT_WRITTEN
  }
}

Note that on each branch of the if statement the analysis sets a property according to whether the true or false branch was taken. Some analyses don’t do this, but in our case we know that return rv returns false only by taking account of the branch condition.

The second step is to check the requirement at each return statement using the results from step 1. So, for return rv, first we read off rv = FAILURE, which tells us to check property (b), that all outparams are set. Then we see out = NOT_WRITTEN, so the check succeeds.

I liked this design because it was fairly simple, and also because it can be implemented as a standard flow-sensitive abstract interpretation, which is pretty efficient. (When I say “standard flow-sensitive abstract interpretation” I mean the design kind of looks like a combination of two standard compiler passes: the constant propagation optimization (for the return values) and the unassigned variables check (for the outparams).)

“For any complex problem, there is always a solution that is
simple, clear, and wrong.”
Unfortunately, attempt #1 doesn’t work, for two reasons. First, GCC adds stuff to the code before the analysis sees it, so the method above looks more like this:

nsresult SomeMethod(nsIWhatever **out) {
  nsresult rv;
  rv = doSomething();            // rv = ?
  tmp = rv;                      // rv = ?, tmp = ?
  if.temp = NS_SUCCEEDED(tmp);   // rv = ?, tmp = ?, if.temp = ?
  if (if.temp) {
                                 // rv = ?, tmp = ?, if.temp = SUCCESS
    out = mValue;                // rv = ?, tmp = ?, if.temp = SUCCESS
    return NS_OK;                // rv = ?, tmp = ?, if.temp = SUCCESS
  } else {
                                 // rv = ?, tmp = ?, if.temp = SUCCESS
    return rv;                   // rv = ?, tmp = ?, if.temp = FAILURE
  }
}

The analysis tracks if.temp through the if branches, but fails to pick up any information about rv. So when we reach return rv, we don’t know whether to check (a) or (b), and we can’t check the method—we can only issue a (spurious) warning. In order to make this work, we need to track not only so much values of return variables as their relationships. We can represent the relationships as logical formulas:

nsresult SomeMethod(nsIWhatever **out) {
  nsresult rv;
  rv = doSomething();            // (empty formula)
  tmp = rv;                      // tmp == rv
  if.temp = NS_SUCCEEDED(rv);    // tmp == rv, if.temp  SUCCEEDED(rv)
  if (if.temp) {
                      // tmp == rv, if.temp  SUCCEEDED(tmp), if.temp
    out = mValue;     // tmp == rv, if.temp  SUCCEEDED(tmp), if.temp
    return NS_OK;     // tmp == rv, if.temp  SUCCEEDED(tmp), if.temp
  } else {
                      // tmp == rv, if.temp  SUCCEEDED(tmp), not if.temp
    return rv;        // tmp == rv, if.temp  SUCCEEDED(tmp), not if.temp
  }
}

Now, when we reach return rv, we have enough information to figure out that rv is a failure code, assuming we have a theorem prover that can reason like this:

if.temp  SUCCEEDED(tmp)               ===>  not if.temp  FAILED(tmp)
not if.temp, not if.temp => FAILED(tmp)  ===>  FAILED(tmp)
FAILED(tmp), tmp == rv                   ===>  FAILED(rv)

I said there was a second reason the simple analysis doesn’t work, which is code like this:

nsresult AnotherMethod(nsIFoo **out) {
  nsresult rv = mBar.DelegateMethod(out); // rv = ?, out = ?
  return rv;
}

The analysis should not consider this an error: it should assume that DelegateMethod follows the outparam protocol correctly, which means that that rv is a success code iff out was written, and the requirement is satisifed. But the basic analysis gets no information about both rv and out. Again, we need to track relationships. This time, we want to model the method call using the formula SUCCEEDED(rv) WRITTEN(out). When we reach return rv, we will see that we don’t have a definite value for rv. So we check both properties (a) and (b). First, we ask the theorem prover which outparams are written under the assumption SUCCEEDED(rv) and check (a). Then we do it again with FAILED(rv) and check (b).

Dumbest Theorem Prover Evar. At this point, our problem is pretty much solved (minus inordinate effort parsing XPIDL dumps and method signatures to guess what’s an outparam and fixing bugs related to receiver argument representation differences), assuming I have a theorem prover handy. Which I don’t. Writing one is out of the question, it would take way too long. I didn’t even want to try to reuse an existing one at this prototyping stage, because before figuring out how to use the prover, I’d have to search around for one that fits my requirements, test its scalability for my application, and so on.

So instead I set out to design the dumbest theorem prover that could solve my formulas. I realized that my formula components (some may call them “conjuncts”) are generated by assignment statements, and when we solve the formulas, in each step we learn something about the left-hand side of an assignment and use it to infer something about the right-hand side. If we represent formulas accordingly, solving is easy. Here’s an example: Above, we needed to solve this formula:

tmp == rv, if.temp  SUCCEEDED(tmp), not if.temp

We’ll represent the first two conjuncts like this:

tmp:          [ANY](tmp)     => [ANY](rv)
if.temp:      TRUE(if.temp)  => SUCCEEDED(tmp)
if.temp:      FALSE(if.temp) => FAILED(tmp)

The name before the colon is the left-hand side of an assignment, which tells us that any time we learn a fact about that variable, we should apply this rule. After the colon is a standard implication. On the tmp line, we have the special notation [ANY], which just means that any predicate true of tmp is also true of rv (because they are
equal).

Now, to solve, we assume FALSE(if.temp) and see what happens. We have a new fact about if.temp, so we look up its rules and find two. The first has TRUE(if.temp), so it doesn’t match. The second has FALSE(if.temp), a match, so we infer a second fact, FAILED(tmp). We look for tmp rules and find one. Because it has [ANY], it automatically matches and tells us to infer FAILED(rv), which is the answer we need. We also look for rv rules, but don’t find any, so we’re done.

The best part is, the solver algorithm is 14 lines of Python.

Finishing up. It’s not obvious how to implement this algorithm as a standard-ish flow-sensitive analysis (although I’m sure it’s possible). This is because you need to know how to merge two states when control flow rejoins after an if statement, and I don’t know how to merge two of these formulas. So I went back to the Dehydra Classic idea of running a path-sensitive analysis and pruning paths if we have previously analyzed the same program point in the exact same state. I also try to make the analysis forget formulas that it doesn’t need any more, so that it’s more likely the states match and we get to skip a path. The analysis runs in a reasonable length of time, although it’s not exactly fast at this point. It’s not necessarily because of the
analysis algorithm: it could be just the time to parse the JSON method CFGs or some (avoidable) overhead like that.

I was happy to discover that some fairly tricky analysis reasoning could be carried out by such a simple prover. It took a lot of hard thinking to get there. I didn’t figure it out in two clean steps like I explained above: really, I went back and forth between how to express the properties logically and how to solve them, with frequent stops just to be confused, trying to make each part simpler in a way that also made the rest simpler. The final result looks like a tiny version of Prolog, if I have any memory of what that language is.

The code is all in my Mercurial repository. The analysis itself is in the Python file checkOutparamProto.py.

Next I have to debug the analysis so it runs correctly on Firefox. After that, it’s supposed to be turned into Javascript that runs under Treehydra, which sounds pretty hard right, but we’ll see.

If any brave person is interested in this sort of thing, we could definitely use help with creating/integrating provers, optimizing analysis/proof engine performance, improving path exploration algorithms, reengineering all of this to run in Javascript over Treehydra CFGs, and anything else that helps.

MXR your Dehydra

MXR is cool. I use it all the time. But those of us that have been using or creating Dehydra know that Dehydra could be used to make MXR a lot cooler. I don’t seem to have time to work on it, though, so I thought I’d throw some ideas out here, to see if someone thinks it sounds fun and wants to run with it, or (cue ominous chords) if popular demand requires me to do it.

As an example of why Dehydra is needed, look at some random line of source code, like nsHTMLImageAccessible.cpp:238. To know what this does, I want to navigate to the code for SetParent. I can click on it, but then I get a page with links to a dozen or so different SetParent functions, and I don’t know which ones can actually be called on line 238.

To get the right answer, first I have to look up the type variable, privatePrevAccessible, which turns out to be nsPIAccessible. I used MXR to do that, but I had to scan the page for my file name and then pick one of two possible variable definitions (one of which was actually not a definition). So then I can use MXR on SetParent again and search for nsPIAccessible, but I don’t find anything. I try nsPIAccessible in MXR, but I get nothing for that. It would be way too boring to read everything I did after that, but another 10-15 minutes of find, grep, and MXR to find and verify (tracing the ancestry of nsListBulletAccessible was especially tedious) that the SetParent methods that can get called are at

accessible/src/base/nsAccessible.cpp:457
accessible/src/html/nsHTMLTextAccessible.cpp:338

(I know you wouldn’t have been able to sleep tonight until you got the answer.)

Dehydra can do all this automatically. Dehydra knows the type of every variable. Dehydra knows the class hiearchy, so it can find all the concrete classes that that variable can be, and all the SetParent methods those classes have. Dehydra can also look up local variables, parameters, and fields.

So what would I like to see? How about, in MXR, when you click on a reference to a variable or class member, you navigate to the definition(s). And when you click the name in the definition, you get a page listing all the uses. As a starting point, if MXR integration is hard, it would be fine to give, say an MXR URL (which has file name and line number) to some web page, which would then give links for the names on that line of code. One copy/paste is a lot better than 15 minutes of grep. Some AJAX goodness could make the UI easier, too, I imagine. And the identifier search could get an option for whether the input is a function, local variable, etc. (A crazier idea is to get the information in a form Emacs understands to make a smarter tags function.)

(Or does something like this already exist? A quick Google found a program for C, with no plans to produce a C++ version. Has anyone run CScope on Firefox?)

A nice bonus is that because Dehydra is a gcc plugin, the cross-reference builder can run whenever you recompile, so you potentially have your own code immediately added to a local cross-reference database.

A few notes on string APIs

1. For my own comprehension, I used Dehydra GCC + Graphviz to print a class hierarchy diagram for the Mozilla string classes (i.e., everything defined in xpcom/string). The results are here.

I was a little surprised that nsSubstring_base is the root, because in nsTAString.h, the root is nsTAString_CharT. Turns out that is #defined to nsSubtring_base in string-template-def-unichar.h. That particular bit of confusion should go away in Mozilla 2.0 because it was apparently created for a legacy API which is already disabled.

2. All my staring at Mozilla string code has created in me the crazy idea of replacing the current string classes with templates as part of Mozilla 2.0 codebase modernization. Here’s a proposal for your consideration and comments:

As shown in the diagram, there are two parallel string class hierarchies: the nsAString versions for (16-bit) PRUnichar strings, and nsACString versions for (8-bit) char strings. This is the kind of thing a C++ programmer would expect to be done in templates. Current trunk does something completely different, for good reasons it turns out, but I think C++ and it’s compilers are now good enough to replace it.

The History. The something different works like this: There is a single implementation of the hierarchy in nsTSubstring{.h,.cpp}. (T is for “template”, or “template-like-thing”, I think.) Then, nsSubstring{.h,.cpp} have patterns like this:

// declare nsSubstring
#include "string-template-def-unichar.h"
#include "nsTSubstring.h"
#include "string-template-undef.h"

// declare nsCSubstring
#include "string-template-def-char.h"
#include "nsTSubstring.h"
#include "string-template-undef.h"

string-template-def-unichar.h #defines the “type names” (such as charT) in nsTSubstring.h to the PRUnichar/nsAString versions, while string-template-def-char.h #defines the same names to the char/nsACString versions. This renaming provides the effect of templates.

The reason for doing it this way instead of using templates, I’m told, is that string-template-def*.h also #defines either CharT_is_PRUnichar or CharT_is_char, and then nsTSubstring selects code accordingly using #ifdef. For example, nsTSubstring_CharT::AssignASCII (usually known as nsAString::AssignASCII) has completely different implementations. And that’s not the kind of thing you get with the commonly-known template container pattern, which just substitutes different types in the same implementation.

A Brave New World. But the C++ people have come up with a new way to do this: C++ traits. The concept seems to be very general, but for Mozilla strings, I can explain it this way: Traits allow a template class C<T> to select variant code and data for different types T. And it’s done by creating a traits template class Traits<T> with partial specialization to create the variants. The article linked above explains the details pretty well, so I’m not going to duplicate the author’s work.

To test this out, I wrote a simple template string class with the key feature of the Mozilla strings, namely an AppendASCII method that is just completely different for the variants. And it works perfectly on MacOS 10.5 g++ 4.0.1. It even optimizes nicely: the compiler was able to inline everything, so I’m pretty sure there is no performance cost for the method delegation.

I’m sure a template guru could improve my code. And I know there are other options. One interesting idea is for the traits class to have a boolean-valued (probably implemented as an enum) CharT_is_char. The code in String<CharT> could then use if statements to branch on that variable. In each instantiation of the template, the branch is constant, and any decent compiler, including all the ones we use, should inline it. The resulting code would have the same form as the current version, without all the cpp aftertaste.

The only potential gotcha I see is whether all of our compilers accept traits code and the partial specialization it requires. I’d bet they do; this stuff has been around for a few years at this point.

DTrace C++ Mysteries Solved

I’ve been using DTrace on Leopard in my recent work, and while it’s a great tool, the C++ support is confusing and I couldn’t find proper documentation. But eventually I found sketchy documentation that gave me the answers, so in the interest of saving others from pain:

Basic Call Profiling. One of the most basic profiling tasks is recording function call entries and returns. And DTrace is very good at this, using the pid provider. For example, if you have a simple C program ‘prog’ that contains a function ‘foo’, you can get DTrace to trace calls to ‘foo’ like this:

sudo dtrace -c './prog' -n 'pid$target:prog:foo:entry'

Unpacking this, the -c option starts a program and the process id in a special DTrace variable-type-thing called $target. The -n option specifies a probe to trace using DTrace’s standard 4-part syntax. In this case, the 4 parts are: (1) the provider, pid$target, which means user function call tracing in process $target, (2) prog, the module to trace functions in, (3) the function to trace, foo, and (4) entry, meaning we want function entries (as opposed to returns or other instructions). (Thanks to Vlad for giving me the verbal tutorial on this part.)

This time, with C++. So far, so good, but when you try to do this for C++ programs, some weirdness sets in. Let’s say our C++ program has a class C that contains a method bar that takes an int argument. This means the C-style symbol that the linker operates on is the mangled name, some junk like __ZN1C1barEi. According to Sun’s article on DTrace with C++, which is all I could find by Googling “dtrace c++” you use the mangled name in the provider specification, as in

sudo dtrace -c './prog' -n 'pid$target:prog:__ZN1C1barEi:entry'

But that didn’t work at all for me on Leopard. DTrace said “invalid probe specifier, blah, blah, blah”. I figured out part of the answer by trial and error, but I didn’t get the full answer until I remembered that Vlad suggested the support for demangled names might have been added as part of Objective C support, Googled “dtrace objective c”, found a blog post, followed a link from there to the Apple dtrace man page, saw the “-xmangled” option, Googled “xmangled”, and found an Apple mailing list thread. Ugh. Is there no better way to find documentation? (I guess I should be comparing with the effort of searching a wall full of manuals, and instead of complaining, I should be grateful that once again, Teh Mighty Interwebs have proven infinitely wise.)

Anyway, the answer is that on Leopard you can specify probes for C++ functions using either mangled names or unmangled names, with appropriate obscure incantations either way.

Unmangled:

sudo dtrace -c './prog' -n 'pid$target:prog:C??f(int):entry'

The key is that you have to give the whole signature for the method, but you can’t have colons in there because the probe specification parser gets confused, so you use the ? wildcard instead. You can also abbreviate using other wildcards, as in C??f*.

Mangled:

sudo dtrace -c './prog' -n 'pid$target:prog:__ZN1C1barEi:entry' -xmangled

-xmangled tells DTrace to use the mangled names. If you do it this way, you’ll also see the mangled names in the output.

WTF-16

I’ve been doing some speed and memory profiling of Firefox string usage to help in figuring out what to do with string encodings Mozilla 2: should we go all UTF-8, all UTF-16, or keep the current mishmash of ASCII, ISO-8859-1, UTF-8, and the reviled UCS-2? The full profiling results are in the string analysis bug, but I wanted to mention the big things and a few side notes here.

Basically, it turns out that encodings don’t affect browser performance much and we might as well keep what we have, at least for the near future. Switching Firefox to all UTF-8 would reduce memory consumption for string data by about 25%, which sounds exciting until you discover that string data represents about 1-2% of Firefox memory consumption (I got 9 MB out of 400 MB in a torture test of opening 80 pages in tabs). On the speed side, there are a lot of calls to UTF-8/UTF-16 conversion functions, about 30,000 calls in a test of loading 20 pages, but the time taken by conversions seems to be around 0.05% of total time, although it’s hard to measure accurately.

Interesting fact: in my tests, even CJK web pages (a set of 20 pages from Alexa and a set of 24 Wikipedia articles) use a lot less memory in UTF-8. This is surprising to many (including me) because CJK characters all need at least 2 bytes in UTF-8. But it turns out that they usually don’t need more than 2 bytes, and more importantly, a lot of web content on CJK sites is ASCII style sheets, tags, and scripts.

[Update: This isn't quite right. CJK characters take 3 bytes in UTF-8. Memory usage is lower in UTF-8 because my CJK test pages (which may or may not be representative) generate even more ASCII string content than I thought. I left details in a comment, but WordPress keeps eating it, so I'll put it here:

On the Wikipedia test, nsTextFragments contained about 1/2 ASCII characters (HTML elements, scripts, small amounts of Latin-1 content, etc.) and 1/2 CJK characters, so it averages out to about 2 bytes per character in UTF-8 and uses only 1% more space than UTF-16. The current FF code uses 18% less memory than UTF-16 because it encodes strings that are entirely ISO-8859-1 in 1 byte per character, and 27% of strings were able to use that encoding.

nsStringBuffers contained 95% ASCII characters, so they are about 1/2 the size in UTF-8 compared to UTF-16. Current FF takes up 90% as much space as UTF-16 because it encodes 80% of the buffers in UCS-2.

nsStringBuffers took up 4x as much space as nsTextFragments, so they dominate the total and you still see a total space reduction of 26% by going from current FF to all UTF-8.]

Despite the evident non-need to change encodings for performance reasons, in my opinion, it would be nice if someday we could go all UTF-8. It’s not worth the huge effort today to reduce memory by 1%, but reducing memory by 1% is still a good thing, especially for mobile devices. And it would be nice if programmers never had to think about encoding decisions or conversions because everything was UTF-8. There is one problem, though: Windows and MacOS platform APIs are UTF-16, so some conversions would still be necessary. So going all UTF-16 has some benefits too.

Elsa parsing a complete Firefox build on gcc 4.3

Background: Taras and I are using the Elsa C++ for automatic rewriting of Firefox code. One problem we had is that Elsa wasn’t compatible with header files from recent gcc, so in order to use Elsa, you had to configure and build Firefox with gcc 3.4. And that’s an old enough gcc that you probably had to build gcc yourself, too.

Elsa didn’t work with gcc 4 mainly because of bugs in Elsa’s type checker, especially in handling templates, and because Elsa’s grammar for gcc extensions was not keeping up with gcc. g++ seems to be evoloving rapidly, picking up new C++ features, using templates more extensively, and introducing new builtin operators in order to implement TR1 features. This makes it pretty hard for Elsa to keep up.

Back to the reason for this post: we’ve made a lot of progress , and just reached an important Elsa milestone for Firefox: With my latest patches, Pork Elsa parses all 1845 files in a Linux build from a recent trunk checkout using gcc 4.3 headers. I also merged in all the upstream fixes from the Elsa authors, which will hopefully let us get our changes merged back upstream.

There’s one fly in the ointment: in its default mode, Elsa still throws out a whole bunch of error messages on gfxFontUtils.cpp. But all of those errors are from Elsa trying to check templates that aren’t being used. Such checking is optional, not done by standard compilers, and can be turned off in Elsa. So it’s OK, but it still bugs me a bit, though it’s not at all worth working on for now.

I thought I’d share one of the more intricate bugs I fixed, in case anyone is interested in how tiny corners of C++ make your head explode. If you’re not interested in that kind of thing, this post ends here. kthxbai!

The Error: Elsa failed to parse this construct used in Firefox (the comments explain what’s going on in C++ here):

// "nsDerivedSafe" -- a restricted wrapper for T.
//     Should be usable anywhere a T is required.
template<typename T>class D : public T {
};

// "nsCOMPtr" -- a smart pointer-to-T.
template<typename T>
class Ptr {
  T *mPtr;
public:
  // This implicit conversion operator allows Ptr<T> to be used
  // wherever a T* is required.
  operator D<T>*() { return reinterpret_cast<D<T>*>(mPtr); }
};

class Foo; // Declared (but not defined) COM class

int someFunction() {
  Ptr<Foo> p;
  // The next line uses p as a boolean. This is OK because we can implicitly
  // convert p to a D<T>*, and any pointer can be used as a boolean
  // implicitly by testing for null. Thus, the smart pointer acts just like a
  // regular pointer here, which is exactly the kind of thing smart pointers
  // are supposed to do.
  if (1 || p) { }
  }

Elsa wouldn’t accept this–Elsa said that the line with if (1 || p) was trying to use an incomplete (declaration only, no definition) class Foo as a base class. This really confused me, because the class D (nsDerivedSafe in real life) doesn’t appear anywhere in the code near the point of the error, so it was really hard to figure out what was going on.

The Cause. After reading Elsa code, nsCOMPtr and nsDerivedSafe code, and the C++ spec, and feeding various small examples to g++ and Elsa, I came up with the reduced test case above and a basic understanding of the problem: When a C++ compiler is trying to make implicit conversions and it gets a template pointer type (such as here), sometimes the compiler needs to instantiate the template in order to complete the conversion. Elsa handled the issue by always instantiating, but in the test case above, instantiation is not needed, and also not possible, so it causes a spurious error message. There was even a comment in the Elsa coding indicating that it might be instantiating too much.

The Ill-Fated 1LF. My first try at a fix was a one-line fix that simply made Elsa  instantiate the template only for reference types, not pointer types. The comments in Elsa seemed to indicate that instantiation was needed for references, but said nothing about pointers, so I figured they were included for no reason. I made the fix and the Firefox file parsed perfectly. Yay! So I ran the regression test suite included with Elsa and got a bunch of errors. Disappointment. Turns out the example above can be extended with stuff like this:

// Some function that needs a pointer to Foo.
void whatever(Foo *p);

// define class Foo: now templates using it can be instantiated.
class Foo { };

void myFunction() {
  Ptr<Foo> p;
  whatever(p);
}

For the call to whatever to work, the compiler needs to use the same implicit conversion as before to convert p to a D<Foo>*, and then notice that D<Foo> is derived from Foo, so a D<Foo>* can be passed in for a Foo*. The derived class relationship exists only if Foo is defined and D<Foo> can be instantiated. This is also an important way in which smart pointers are supposed to work like regular pointers, so it has to be fixed.

The Specification. Apparently, the rules of C++ are that templates are required to be instantiated only if used in a context that truly requires instantiation in order to compile. The compiler could try to instantiate them when they aren’t needed, but it had better not report any errors in that case, or else it will fail to compile good programs.

The Elsa people evidently figured most of this out, and had a code comment pointing to the rule above, but they didn’t realize they really shouldn’t be doing any unnecessary instantiation at all, probably because they had never come across any test cases where it mattered.

The Fix. At this point, I knew what I had to do: teach Elsa to instantiate templates only when required. But that would have been hard: I would have had to disable the existing instantiation calls, and inserted calls to try instantiation at every other point in the code that might need it. And I don’t really understand Elsa’s template code.

So instead, I decided it would be almost as good if I just taught Elsa to ignore error messages when trying to instantiate a template that may or may not need to be instantiated. If the template does need to be instantiated, there will still be error messages later, at the points where the instantiation is needed, so it should work. This fix didn’t look easy either: the template instantiation process is distributed through several functions, all of which can produce errors. And the errors are sent straight to the global error list, so it seemed like there was no way to retract them.

I was afraid I was going to have to modify the error reporting of every function reachable during template instantiation, but after reading around in Elsa in a state of vague unease for a while I found a funny little mechanism that Elsa had for suppressing error messages in uninstantiated templates. (Which goes way back to the beginning of the post–this is the mechanism that has to be turned on for gfxFontUtils.cpp.) This mechanism removed all the previous error messages at the beginning of an uninstantiated template, then at the end, filtered out most of the template errors and put the original errors back. I find this…baroque, but it does do exactly what I needed, and it turns out the code it filters errors from is completely shared with the template instantiation code, so it was easy to adapt it to suppress errors during optional template instantiation.

So I added extra parameters to the top couple of template instantiation functions, tweaked the error suppression class, and it was done. And it worked for Firefox, and for the regression suite. The top-level change amounts to inserting “, true” into a function call argument list.

ZOMG JSON CFGs!

One thing at a time:

CFGs

I’ve been working on a CFG extractor for C++ using Taras’ gcc plugin system, and I’ve got a basic prototype working. CFG stands for “control flow graph”, which is a low-level representation of a function suitable for automatic analysis. A CFG has two main features: it shows the control structures of the function in flow-chart format, and instead of having C++ code it has a sort of bytecode. (In my case, GIMPLE expressions, which is how gcc represents CFGs.)

Here is a graphical view of the CFG for the method nsServerSocket::TryAttach.

JSON

As suggested in the title, what my plugin really does is output CFGs in JSON format. I don’t know if Taras entirely approves, but I think it’s a great idea, because (a) then I don’t have to mess with JSAPI (the C++ API for the Spidermonkey JS interpreter), (b) in my experience it really helps to have textual intermediate formats for complex analysis processes, and (c) you can write code to explore the CFG in your favorite language, in my case Python.

ZOMG

The immediate practical goal I have in mind is analyzing how error codes (nsresults) are used in Firefox, so I can figure out how to replace them with C++ exceptions. As a test example, I started writing a script to find the propagate pattern, which is anything that looks like:

    nsresult rv = callSomeMethod();
    if (NS_FAILED(rv)) return rv;

The significance of propagate is that refactoring it to use exceptions is easy: just remove rv and all the code that uses it.

My idea for the script is to find method calls, then for each call do a DeHydra-like exploration of all forward paths that can be reached when rv is a failure code (i.e., rv != 0). If all the failure paths return rv before doing anything else, then we have the propagate pattern.

I’ve coded up a crude version of this idea that’s just good enough to find the two instances of propagate in TryAttach. And it runs in about .007 seconds on TryAttach, which is a rate of 500,000 methods per hour. Over all of FF, it will almost certainly not run that fast, but at least I haven’t already proven it’s too slow.

Next steps will be trying the analysis on more FF methods, or cleaning up the CFG extractor and adding it to the Dehydra-gcc repository.

NS_OK-always analysis update

First note: I forgot to mention earlier that the results of these analyses are being posted in Bugzilla as bug #407444.

And as I wrote there, the analysis worked, finding about 10,000 methods that always return NS_OK out of 30,000 methods that return nsresult. The constraint system took about 6 seconds to solve using my basic Python solver–my worries about solving time were unfounded. It took about an hour to generate the constraint system, because of my horribly inefficient database query representation of the call graph.

Now the question becomes, what to do with these methods.