<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Mandelin&#039;s blog &#187; treehydra</title>
	<atom:link href="http://blog.mozilla.com/dmandelin/category/treehydra/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.com/dmandelin</link>
	<description>Just another Blog.mozilla.com weblog</description>
	<lastBuildDate>Tue, 03 Nov 2009 03:09:26 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Static analysis newslets</title>
		<link>http://blog.mozilla.com/dmandelin/2008/09/05/static-analysis-newslets/</link>
		<comments>http://blog.mozilla.com/dmandelin/2008/09/05/static-analysis-newslets/#comments</comments>
		<pubDate>Fri, 05 Sep 2008 18:41:14 +0000</pubDate>
		<dc:creator>dmandelin</dc:creator>
				<category><![CDATA[treehydra]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=26</guid>
		<description><![CDATA[I&#8217;ve been in interpreter-land lately, but I do help out a bit with static analysis projects when I get the chance. So I&#8217;d better post an update here on some interesting developments that haven&#8217;t been publicized yet.
First, Keith Schwartz (one of our interns) is making great progress on automatic const-correctification for Mozilla. The basic idea [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been in interpreter-land lately, but I do help out a bit with static analysis projects when I get the chance. So I&#8217;d better post an update here on some interesting developments that haven&#8217;t been publicized yet.</p>
<p>First, Keith Schwartz (one of our interns) is making great progress on automatic <a href="http://en.wikipedia.org/wiki/Const_correctness">const-correct</a>ification for Mozilla. The basic idea is to put <strong>const</strong> on as many declarations as possible without breaking the code or introducing casts. Keith has devised an algorithm based on <a href="http://en.wikipedia.org/wiki/Type_inference">type inference</a>. Currently, he&#8217;s working on the Treehydra code to extract the type constraints from code. Because C++ is so complicated, there are a ton of details, and he&#8217;s had to master the insanity of GCC intermediate representations of pointers to member functions and calls through them. (If by chance, any readers know of a non-insane way to access them in GCC, let us know.)</p>
<p>Second, the <a href="http://cairographics.org/">Cairo</a> folks were kind enough to give me an hour to talk about static analysis with the Hydras at their recent meetup. They already had 2 static analysis applications in mind. One was ensuring that internal-only return codes aren&#8217;t returned from the public API. The other was checking that integer and fixed-point values, both represented by C ints, aren&#8217;t mixed together. I think both of these can be formulated as type checking or inference problems.</p>
<p>I&#8217;m hoping we can extract a generic Treehydra type inference library from Keith&#8217;s code for the Cairo problems and others. One issue here is that Cairo is C, while Keith&#8217;s been using the C++ ASTs for his work. I don&#8217;t even know if Treehydra can read C ASTs at this point, but I think Treehydra&#8217;s design makes extending to C not too hard.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/dmandelin/2008/09/05/static-analysis-newslets/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>ESP: MSR&#8217;s little helper</title>
		<link>http://blog.mozilla.com/dmandelin/2008/04/18/esp-msrs-little-helper/</link>
		<comments>http://blog.mozilla.com/dmandelin/2008/04/18/esp-msrs-little-helper/#comments</comments>
		<pubDate>Fri, 18 Apr 2008 22:45:12 +0000</pubDate>
		<dc:creator>dmandelin</dc:creator>
				<category><![CDATA[esp]]></category>
		<category><![CDATA[outparams]]></category>
		<category><![CDATA[treehydra]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/2008/04/18/esp-msrs-little-helper/</guid>
		<description><![CDATA[The Javascript/Treehydra version of the outparam usage checker is finally nearing completion: all that&#8217;s left is packaging it as a patch that can go into mozilla-central (plus the inevitable future debugging). In my last post, I mentioned that the checker is based on ESP, an program analysis technique invented at Microsoft Research. A few people [...]]]></description>
			<content:encoded><![CDATA[<p>The Javascript/<a href="http://wiki.mozilla.org/Treehydra">Treehydra</a> version of the <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=420933">outparam usage checker</a> is finally nearing completion: all that&#8217;s left is packaging it as a patch that can go into mozilla-central (plus the inevitable future debugging). In my last post, I mentioned that the checker is based on <a href="http://www.google.com/search?q=ESP%3A+path-sensitive+program+verification+in+polynomial+time&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a">ESP</a>, an program analysis technique invented at Microsoft Research. A few people have asked for a post about ESP (the paper is good, but very dense if you don&#8217;t have a PL research background), so here it is.</p>
<p><strong>Why ESP? </strong><br />
First I should explain why I bothered implementing a new outparam checker design given that I had a working version based on theorem proving. The problem was that that the theorem-proving version worked by analyzing &#8220;every&#8221; path in each method. Or it would have worked if it could analyze every path. But a method with N <code>if</code> statements can have 2^N paths, and N gets big enough that Mozilla has a method with 8 million paths. Worse, methods with loops have an infinite number of paths. In practice, path-based analyses have to give up after about 1000 paths, leaving the rest unanalyzed.</p>
<p>In short, path-based analysis is very precise, but lacks coverage of all the code paths. Conversely, the abstract interpretation approach I showed in my previous post does cover all code paths, but it mixes them up so much that it ends up being too imprecise to work at all.</p>
<p>When I saw this problem, I remembered ESP right away, because the whole point of ESP is to get the precision of path-based analysis with the speed and coverage of abstract interpretation. But after reviewing the paper, I couldn&#8217;t really see how to make ESP solve the problems I described before, so I went the theorem proving route. But once I got stuck on the path explosion problem, I went back to it, and eventually it hit me. Now it seems kind of obvious. So, it seems like I should be able to explain ESP and its application to outparams in a way that makes it sound simple, but that turned out to be hard. Hopefully it&#8217;s at least comprehensible.</p>
<p><strong>Abstract Interpetation Redux.</strong><br />
Previously, I tried out abstract interpretation with pen and paper and found that it didn&#8217;t even come close to working for outparams. (Reminder: abstract interpretation means running the code in a special interpreter that (a) tracks finite(-ish) <em>abstract states</em> instead of the standard program state, (b) goes both ways at branches and (c) merges state when control rejoins. This has the effect of running the method on every possible input value and every path in finite time. The price is that the output is abstract states instead of full detail.) Here are the results again (the table on the right shows the abstract state after abstractly interpreting each statement):</p>
<pre>
 1   nsresult SomeMethod(nsIX **out) {      out       rv   tmp   if.temp
 2     nsresult rv = doSomething();      not-written   ?
 3     tmp = rv;                         not-written   ?    ?
 4     if.temp = NS_SUCCEEDED(tmp)       not-written   ?    ?      ?
 5     if (if.temp) {                    not-written   ?    ?    true
 6       out = mValue;                       written   ?    ?    true
 7       return NS_OK;                       written   ?    ?    true
 8     } else {                          not-written   ?    ?    false
 9       return rv;                      not-written   ?    ?    false
10     }
11   }</pre>
<p>These analysis results are too imprecise to check the return on line 9: <code>rv</code> is unknown, so the analysis has to assume that the return value could be success, which is an error because <code>out</code> has not been written at this point. Note that the abstract interpretation <em>never</em> had any information about <code>rv</code>. Clearly, total ingorance about <code>rv</code> just won&#8217;t work, and any algorithm that works <em>must</em> track the relationship between <code>out</code> and <code>rv</code> that is created by line 2.</p>
<p><strong>A Smarter Abstract State Space.</strong><br />
Abstract interpetation can track that relationship, but it needs to use a more complicated abstract state than the one I implicitly used above. The abstract state in my table above is a mapping of variables to abstract values. (Compare with the real program state, which is a mapping of variable to C++ values.) That&#8217;s the simplest and most common abstract state, but there&#8217;s really nothing special about it. An abstract state can be any representation of a set of program states: the game is to choose an abstract state space that is &#8220;fine&#8221; enough to represent the information we need, but no finer, so the abstract states stay small and simple.</p>
<p>We need a state space that can represent facts like &#8220;<code>if.temp</code> is true iff <code>tmp</code> is a success code&#8221;. I can write that more explictly as, &#8220;<code>if.temp</code> is true and <code>tmp</code> is a success code, <em>or</em> <code>if.temp</code> is false and <code>tmp</code> is a failure code.&#8221; And that looks just like the &#8220;or&#8221; of two mappings of variables to abstract values. So, it looks like we can use an abstract state that&#8217;s just like our original state, except allowing <strong>multiple &#8220;table rows&#8221;</strong>. If we code the abstract interpreter to use multiple rows when it can, the results of abstract interpretation will come out like this (showing the states between the statements so it&#8217;s easier to separate the rows):</p>
<pre>
 1   nsresult SomeMethod(nsIX **out) {      out         rv    tmp   if.temp
                                         not-written
 2     nsresult rv = doSomething();
                                         not-written   succ
                                         not-written   fail
 3     tmp = rv;
                                         not-written   succ  succ
                                         not-written   fail  fail
 4     if.temp = NS_SUCCEEDED(tmp)
                                         not-written   succ  succ    true
                                         not-written   fail  fail    false
 5     if (if.temp) {
                                         not-written   succ  succ    true
 6       out = mValue;
                                             written   succ  succ    true
 7       return NS_OK;
 8     } else {
                                         not-written   fail  fail    false
 9       return rv;
10     }
11   }</pre>
<p>These results are detailed enough to check outparams perfectly!</p>
<p>A few things to note: In abstractly interpreting line 2, we don&#8217;t know the results exactly, but instead of generating a lot of &#8220;unknown&#8221; abstract values, we generate multiple rows, establishing the correlation among results. Now on lines 3 and 4, we have a multiple-row state, so we abstractly interpret the statements on each row independently. Finally, line 5 is a conditional guard, so at that point, we filter out all the rows that don&#8217;t match the guard (because the program wouldn&#8217;t execute this path in those states). Each of these features is another detail that has to be noticed and coded up in the analysis, but they all fit naturally into the framework of interpreting statements on abstract states.</p>
<p><strong>Path Sensitivity.</strong><br />
This version of the analysis is actually path-sensitive, because if different paths generate different states, those states will be kept as separate rows. Here&#8217;s an example:</p>
<pre>
nsresult OtherMethod(nsIX **out1, nsIX **out2) {
                                        out1          out2         rv    if.temp
                                    not-written   not-written
  nsresult rv = doSomething();
                                    not-written   not-written   success
                                    not-written   not-written   failure
  if.temp = NS_SUCCEEDED(rv);
                                    not-written   not-written   success   true
                                    not-written   not-written   failure   false
  if (if.temp) {
                                    not-written   not-written   success   true
    out1 = mFoo;
                                A:      written   not-written   success   true
  } else {
                                B:  not-written   not-written   failure   false
  }
                                C:  // Join point -- state is union of A and B.
                                        written   not-written   success   true
                                    not-written   not-written   failure   false
  doMoreStuff();
                                        written   not-written   success   true
                                    not-written   not-written   failure   false
  if (if.temp) {
                                        written   not-written   success   true
    out2 = mBar;
                                        written       written   success   true
  } else {
                                    not-written   not-written   failure   false
  }
                                     // Join point
                                        written       written   success   true
                                    not-written   not-written   failure   false
  return rv;
}</pre>
<p>It&#8217;s kind of hard to read, but the key point is that there are two <code>ifs</code> with the same guard, and to analyze the method correctly, we need to know that of the 4 possible paths, only 2 can actually be taken. State C is the important one: after finishing the first <code>if</code>, at the join point we merge the states by simply collecting all the rows. Each path has a different row, and the rows stay separate, so on the second <code>if</code>, the analysis executes the then branch only in the states generated by the first then branch.</p>
<p>This is actually the kind of thing the ESP authors were most concerned with in their paper. It&#8217;s pretty neat but the problems I had look very different, which is why it took me so long to see the connection.</p>
<p>A nice thing about this kind of path sensitivity is that if the state is the same along two branches, the rows will &#8220;rejoin&#8221; at the join point, essentially forgetting that there was a branch (because it didn&#8217;t really matter anyway). It also works with loops.</p>
<p>The problem is that although we don&#8217;t exactly get path explosion anymore, we can get &#8220;row explosion&#8221;: if there are M variables, and each has 2 possible abstract values, we can get 2^M rows in the state. And M can easily get big enough in Mozilla to run out of memory.</p>
<p><strong>ESP.</strong><br />
This is where ESP comes into play. The insight of ESP is that there are some variables you care about a lot (which the ESP authors call <em>property variables</em>), and others you care about only as far as they relate to the property variables (which the ESP authors call <em>execution variables</em>). (For example, in outparams, the property variables are the outparams and any variables that whose values can reach a return statement.) So, if there are only a few property variables, then if we had a way to track only the property values path-sensitively, we can be precise on the things we care about without row explosion.</p>
<p>ESP does this very simply: it just takes our multiple-row states and adds a  <strong>primary key</strong>, namely the set of property variables. Thus, property value combinations and relations are always tracked precisely. Execution variables are tracked as one mapping per property value combination, just as in the basic abstract interpretation. Because of primary key uniqueness, if there are K property variables, there can be no more than 2^K rows in a state, so if K is smaller than 10 or so, the states are small enough to analyze in reasonable time.</p>
<p>An ESP analysis looks a lot like our path-sensitive abstract intepretation, except that after each operation, it &#8220;collects&#8221; rows together to maintain the primary key uniqueness property. For example, if P is a property variable and E is an execution variable, and we need to merge this state:</p>
<pre>
    P = true,    E = false
    P = false,   E = false</pre>
<p>with this state:</p>
<pre>
    P = true,    E = true</pre>
<p>we take the union of rows as before to get this:</p>
<pre>
    P = true,    E = false
    P = false,   E = false
    P = true,    E = true</pre>
<p>but then we merge together rows with the same primary key, yielding:</p>
<pre>
    P = true,    E = anything
    P = false,   E = false</pre>
<p>The significance of ESP is for outparams is that all Mozilla methods have only a few outparams and return value variables, so the analysis runs fast no matter how many other &#8220;unimportant&#8221; variables are in the method.</p>
<p><strong>A small tweak.</strong><br />
Actually, that&#8217;s not quite true. GCC generates a temporary variable for each return statement, so if there are 30 return statements, there are 30 temporary variables, and the state can grow to 2^30 rows. That does happen, and it does make the analysis run out of memory.<br />
Fortunately, I was able to fix this with a just a small tweak to ESP. The temporary variables are only &#8220;live&#8221; between the point where they are created and where they are copied to another return variable, and their values don&#8217;t matter at all outside that live range. At any given point in the method, only a few temporaries are live. So I can keep the number of property values small by &#8220;demoting&#8221; return values to execution values once they are dead. And demotion is trivial to implement: just set the abstract value to any one value, because we&#8217;ll never read it anyway.</p>
<p>The whole outparam analysis came out to about 2500 lines of Javascript, but a lot of that was adapter code to simplify the Treehydra API, plus subsidiary analyses to find return value variables and their live ranges. The ESP framework was 450 lines, and the outparam abstract interpreter was another 800 lines. It runs in reasonable time too, without any effort optimizing it yet. I haven&#8217;t measured it exactly, but I think it&#8217;s less than 20 minutes on 1970 C++ files of Mozilla on a 4-processor machine. I guess you wouldn&#8217;t want to run it on every build, but if you&#8217;re only changing a few .cpp files, it shouldn&#8217;t be too bad.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/dmandelin/2008/04/18/esp-msrs-little-helper/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Making Treehydra do useful tricks</title>
		<link>http://blog.mozilla.com/dmandelin/2008/04/01/making-treehydra-do-useful-tricks/</link>
		<comments>http://blog.mozilla.com/dmandelin/2008/04/01/making-treehydra-do-useful-tricks/#comments</comments>
		<pubDate>Wed, 02 Apr 2008 02:41:51 +0000</pubDate>
		<dc:creator>dmandelin</dc:creator>
				<category><![CDATA[outparams]]></category>
		<category><![CDATA[treehydra]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/2008/04/01/making-treehydra-do-useful-tricks/</guid>
		<description><![CDATA[Taras&#8217; last blog post ended with a comment about &#8220;making [Treehydra] do useful tricks&#8221;, which oddly enough, is exactly what I&#8217;ve been working on, and I&#8217;ve finally made enough progress to blog about it. I&#8217;ve been alternating between implementing a Treehydra Javascript analysis library and adding needed features to Treehydra.
Just today, I managed to do [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.mozilla.com/tglek/2008/03/17/dehydra-world-tour/">Taras&#8217; last blog post</a> ended with a comment about &#8220;making [Treehydra] do useful tricks&#8221;, which oddly enough, is exactly what I&#8217;ve been working on, and I&#8217;ve finally made enough progress to blog about it. I&#8217;ve been alternating between implementing a <a href="http://wiki.mozilla.org/Treehydra">Treehydra</a> Javascript analysis library and <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=423896">adding</a> <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=425034">needed</a> <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=425794">feat</a><a href="https://bugzilla.mozilla.org/show_bug.cgi?id=425846">ures</a> to Treehydra.</p>
<p>Just today, I managed to do an intraprocedural live variable analysis, which is one of the simplest program analyses, on every file in mozilla-central. (Live variable analysis determines the set of variables that may be read in the future at every point in a function. It&#8217;s commonly used in optimization to save storage for unused variables, but I use it to make checkers &#8220;forget&#8221; information about unused variables.) <a href="http://people.mozilla.com/~dmandelin/live_main.svg">Here&#8217;s a visualization</a> of the results for Firefox&#8217;s main() function in a Linux build: the set of live variables is listed at the bottom of each basic block.</p>
<p>It took 25-30 minutes to run on all of Mozilla (as preprocessed C++), but I know a lot of that is simply GCC compile time, and I think a fair fraction of the rest was spent generating the visualizations, which most analyses won&#8217;t do. I guess I need to investigate how to time JS execution internally.</p>
<p>My Javascript analysis library is about 900 lines of code, with modules for Treehydra utilities, GCC data access, GCC value printing, data structures needed for analysis, backward data-flow analysis. I hope these will be reused for other analyses&#8211;there are fewer than 100 lines of code specific to liveness analysis. <a href="http://hg.mozilla.org/users/dmandelin_mozilla.com/treehydra-analysis/">The code is available here.</a></p>
<p>The next step will be to finish the <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=420933">outparam analysis</a>. Hopefully, it won&#8217;t be too hard. The big pieces are:</p>
<ul>
<li>An analysis to determine which variables may reach the return statement of the function (the technique is similar to the liveness analysis).</li>
<li>Port over my <a href="http://www.google.com/search?q=ESP%3A+path-sensitive+program+verification+in+polynomial+time&amp;ie=utf-8&amp;oe=utf-8&amp;aq=t&amp;rls=org.mozilla:en-US:official&amp;client=firefox-a">ESP</a> analysis framework from Python.</li>
<li>Implement the outparam checker in the ESP framework.</li>
</ul>
<p>I prototyped all of it in Python, so I know the algorithms work, and I&#8217;ve ported much of it over to Treehydra/JS for the liveness demo, so I know it codes up nicely there as well. I&#8217;m sure there will be glitches to fix, and I&#8217;m sure I made some mistakes in designing my Javascript framework, but I&#8217;ll just have to see how it goes.</p>
<p>Finally, I have to mention that I&#8217;ve upgraded my Javascript skills quite a bit in the process of doing this (it&#8217;s the most complex JS program I&#8217;ve written, and I&#8217;ve also been using <a href="http://developer.mozilla.org/en/docs/JSAPI_Reference">JSAPI</a>), and it&#8217;s all thanks to the <a href="http://developer.mozilla.org/en/docs/Core_JavaScript_1.5_Guide">MDC Javascript Guide</a>, which has been an excellent resource.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/dmandelin/2008/04/01/making-treehydra-do-useful-tricks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
