<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Taras' Blog &#187; squash</title>
	<atom:link href="http://blog.mozilla.com/tglek/category/squash/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.com/tglek</link>
	<description>Just another Blog.mozilla.com weblog</description>
	<lastBuildDate>Thu, 09 Feb 2012 23:01:54 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Dehydra, prcheck, squash &#8211; in mercurial</title>
		<link>http://blog.mozilla.com/tglek/2007/07/13/dehydra-prcheck-squash-in-mercurial/</link>
		<comments>http://blog.mozilla.com/tglek/2007/07/13/dehydra-prcheck-squash-in-mercurial/#comments</comments>
		<pubDate>Fri, 13 Jul 2007 16:54:08 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[dehydra]]></category>
		<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/07/13/dehydra-prcheck-squash-in-mercurial/</guid>
		<description><![CDATA[New Repository Since I do not yet have write access to oink svn, I have been doing all of my development in ad-hoc repositories within the svn checkout. This made it rather hard to collaborate with others. I finally got sick of the situation (and stumbled upon hgsvn) and converted all 11 svn repositories to [...]]]></description>
			<content:encoded><![CDATA[<p><strong>New Repository</strong></p>
<p>Since I do not yet have write access to oink svn, I have been doing all of my development in ad-hoc repositories within the svn checkout. This made it rather hard to collaborate with others. I finally got sick of the situation (and stumbled upon <a href="http://cheeseshop.python.org/pypi/hgsvn/">hgsvn</a>) and converted all 11 svn repositories to mercurial. To my surprise, mercurial even let me merge my repositories while preserving history (hg has yet to fail me!).</p>
<p>oink uses svn-externals to aggregate the repositories into a single checkout. hg doesn&#8217;t have anything similar, so to checkout all 11 repositories use a script:</p>
<p><code><a href="http://people.mozilla.org/~tglek/checkout.sh">checkout.sh</a> http://hg.mozilla.org<br />
</code><br />
<strong>Released Differences from Oink Mainline </strong></p>
<ul>
<li>New oink tool &#8211; <a href="http://blog.mozilla.com/tglek/2007/06/26/status-report-recent-work/">prcheck</a>: ensures that bool-like integer typedefs behave like bools</li>
<li>New oink tool &#8211; <a href="http://wiki.mozilla.org/DeHydra">dehydra</a>: source query tool with queries specified in JavaScript</li>
<li>New oink tool &#8211; <a href="http://wiki.mozilla.org/Squash">squash</a>: source refactoring tool. This is now deprecated since most of the code in it dealt with working around elsa limitations to do with macro expansion &amp; lack of precise locations. The patching engine used in squash lives on to provide a simple refactoring API for use in other tools (like prcheck).</li>
<li>Minor grammar changes to parse more of Mozilla</li>
<li>Compilation fixes for OSX</li>
<li>Elsa fixes to parse OSX headers</li>
<li>make -j support for elsa</li>
<li>end-of-ast-node location support for elkhound &amp; elsa</li>
<li>preprocessor expansion markup support for elsa</li>
</ul>
<p><strong>Coming Soon</strong></p>
<ul>
<li>Amazing new version of <a href="http://mcpp.sourceforge.net/">MCPP</a> capable of preprocessing mozilla while outputting refactoring-friendly annotations.</li>
<li>Web front-end for squash which will likely be refactored to be tool-agnostic.</li>
<li>Front-end to run patch-producing tools in parallel for multi-core machines</li>
</ul>
<p><strong>Near Future</strong></p>
<ul>
<li>squash will be split up into a library with each major feature ripped out into a standalone tool. Two tools coming soon:outparam rewriter &amp; class member renamer.</li>
<li>RAD for static analysis: oink tool templates to make it trivial to write custom new tools with minimal amount of boilerplate</li>
</ul>
<p><strong>Some time in the Future</strong></p>
<ul>
<li>Collaboration with the author of <a href="http://www.cs.ru.nl/~tews/olmar/">Olmar</a> to provide an OCaml API for Elsa. If everything goes as expected it will be possible to write analyses that are more powerful and more concise than DeHydra ones except they will perform at C/C++ speeds. Plus it should be possible to perform them from a native interactive OCaml toplevel. Most of this work already exists in bits and pieces. It&#8217;s a matter of adding some AST transformations, fixing a few issues and tying it all together.</li>
<li>MapReduce inspired front-end: generic framework for executing transformations/analyses in-parallel and Mozilla-wide without blowing the 32bit address space (as it typical when static analysis tools meet Mozilla).</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/07/13/dehydra-prcheck-squash-in-mercurial/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Undoing CPP Expansion in 3 simple steps. Say &#8220;Hello&#8221; to easier C++ rewriting.</title>
		<link>http://blog.mozilla.com/tglek/2007/06/12/undoing-cpp-expansion-in-3-simple-steps-say-hello-to-easier-c-rewriting/</link>
		<comments>http://blog.mozilla.com/tglek/2007/06/12/undoing-cpp-expansion-in-3-simple-steps-say-hello-to-easier-c-rewriting/#comments</comments>
		<pubDate>Tue, 12 Jun 2007 18:26:01 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/06/12/undoing-cpp-expansion-in-3-simple-steps.-say-hello-to-easier-c-rewriting./</guid>
		<description><![CDATA[This is incredibly exciting: I believe that I finally solved the messy and mind-numbingly boring CPP/C++ integration problem! Having code displaced or generated due to CPP-expansion should no longer be a fatal problem for Squash. I believe macro-expansion is (or was) the single biggest problem between me and large-scale automated refactoring of the Mozilla codebase. [...]]]></description>
			<content:encoded><![CDATA[<p>This is incredibly exciting: I believe that I finally solved the messy and mind-numbingly boring CPP/C++ integration problem! Having code displaced or generated due to CPP-expansion should no longer be a fatal problem for <a title="Source-to-source refactoring tool" href="http://wiki.mozilla.org/Squash">Squash</a>. I believe macro-expansion is (or was) the single biggest problem between me and large-scale automated refactoring of the Mozilla codebase.</p>
<p>What&#8217;s even more exciting is that I think my solution is both incredibly simple to implement and more general than prior work. Most other tools combine the CPP expansion &amp; C parsing into a single step and then integrate  (or should I say violently shove?) CPP constructs into the AST. This results in complete lack of separation between preprocessing and program analysis. For example, due to this tight coupling existing solutions were useless to me because the fancy CPP logic could not be separated from the C parser. I would also have a hard time submitting a more convoluted C++ parser upstream to the Elsa maintainer.</p>
<p><strong>Design</strong></p>
<p>There are three parts to my solution:</p>
<ol>
<li><em>Critical component</em>. A CPP expansion undo-log injected during CPP-expansion by a modified C preprocessor (upcoming version of MCPP). The statements are wrapped in C comments such that the preprocessed result can be parsed by any C/C++/etc parser or compiler. Implementation-wise this is the hardest part since MCPP(as most other C proprocessors) was never designed it keep track of macro expansion info.</li>
<li>A small modification to the Elsa lexer to parse the undo-log and set it aside in a separate data structure.</li>
<li><em>Tricky</em>. A function that utilizes the cpp undo-log to map the preprocessed source locations to the unpreprocessed ones. This is a a ridiculously simple solution to a tricky design problem of how to efficiently advertise the fact that every AST node has at least 2 different source positions (pre expansion, post expansion &amp; a stack of positions resulting from expanding nested macros).</li>
</ol>
<p>The MCPP maintainer is almost done with 1. I have a prototype implementation of 2 &amp; 3 weighing in at less than 500lines. Now that the design phase is complete, the amount of changes to Elsa is trivial, so I should be done with those real soon now.</p>
<p><strong>Looking Ahead</strong></p>
<p>Now I need to modify Elsa to retain more precise source locations. This includes adding end-of-ast-node-location and adding positions to nodes(such as expressions) that don&#8217;t even have a start position at the moment. This combined with cpp-undo-log enhanced precise positions should allow for code rewrites to retain as much original source code as possible. This reduces the amount of ugly machine-generated code and results in better correctness (existing code is likely to work).</p>
<p><strong>CPP Undo-log Example</strong></p>
<p>The undo-log took a couple of tries to get right. Now macro-parameters have a notion of scope and sensible names. The following example features macro-induced column displacement and macro-expansion causing line shrinkage.<br />
<code><br />
#define NULL 0L<br />
#define FOO(a, b) a + b<br />
int i = NULL; int j;<br />
int k = FOO(<br />
FOO(NULL , 1),<br />
2);<br />
</code><br />
Preprocessed version<br />
<code><br />
# 1 "testcase4.c"<br />
/*mNULL 1:8-1:15*/<br />
/*mFOO 2:8-2:23*/</code></p>
<p>int i = /*&lt;NULL 3:8-3:12*/0L/*&gt;*/;<br />
# 3 &#8220;testcase4.c&#8221;<br />
int j;<br />
int k = /*&lt;FOO 4:8-6:3*//*!FOO#0-0 5:0-5:13*//*!FOO#0-1 6:1-6:2*//*&lt;FOO#0-0*//*&lt;FOO*//*!FOO#1-0*//*!FOO#1-1*//*&lt;FOO#1-0*//*&lt;NULL*/0L/*&gt;*//*&gt;*/ + /*&lt;FOO#1-1*/1/*&gt;*//*&gt;*//*&gt;*/ + /*&lt;FOO#0-1*/2/*&gt;*//*&gt;*/;</p>
<p><strong>Conclusion</strong></p>
<p>It took a lot to arrive at such a simple solution. I expect that all of my work is likely to end up upstream in BSD-licensed projects: MCPP &amp; and Elsa/Oink.  I sincerely hope that other people will be able to build on it for their CPP-infested analysis needs and avoid the unbearable mind-numbing discomfort associated with making CPP play along.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/06/12/undoing-cpp-expansion-in-3-simple-steps-say-hello-to-easier-c-rewriting/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CPP Strikes Back</title>
		<link>http://blog.mozilla.com/tglek/2007/05/11/cpp-strikes-back/</link>
		<comments>http://blog.mozilla.com/tglek/2007/05/11/cpp-strikes-back/#comments</comments>
		<pubDate>Sat, 12 May 2007 00:07:26 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/05/11/cpp-strikes-back/</guid>
		<description><![CDATA[I have gotten used to dodging CPP-expansion issues by fudging column &#38; line information until the position info in squash mostly matches the source positions in the original source code. That sufficed for rewriting declarations, but I have finally hit a brick wall. CPP Fun I got as far with call-site outparam rewriting as this [...]]]></description>
			<content:encoded><![CDATA[<p>I have gotten used to dodging CPP-expansion issues by fudging column &amp; line information until the position info in squash mostly matches the source positions in the original source code. That sufficed for rewriting declarations, but I have finally hit a brick wall.<span id="more-20"></span></p>
<p><strong>CPP Fun</strong></p>
<p>I got as far with call-site outparam rewriting as <a href="http://people.mozilla.org/~tglek/outparams.May11.diff">this patch</a>. It demonstrates an interesting flaw.<br />
<code>@@ -8297,1 +8297,1 @@<br />
-  GetInsertionPoint(parentFrame, nsnull, &amp;insertionPoint, &amp;multiple);<br />
+  insertionPoint = GetInsertionPoint(parentFrame, &amp;insertionPoint, &amp;multiple);<br />
@@ -8346,1 +8346,1 @@<br />
-          GetInsertionPoint(parentFrame, child, &amp;insertionPoint);<br />
+          insertionPoint = GetInsertionPoint(parentFrame, child);</code></p>
<p>Due to macro expansion, nsnull contracts to 0 such that the .i file has &amp;insertionpoint positioned right in the middle of nsnull (in the .cpp file). So when squash trims the param including the surrounding commas, it ends up removing the wrong parameter.</p>
<p><strong>Elsa Limitation</strong></p>
<p>I have mentioned lack of end-of-ast-node position information in Elsa. It also lacks start-of-ast-node information for most expressions. This makes selectively rewriting source code rather difficult.</p>
<p><strong>Plan</strong></p>
<p>Instead of fighting an uphill fudging battle against CPP, I am going to have to suspend outparam rewriting yet again to work on better position information and integrating a preprocessor into elsa. This is unfortunate because I was looking forward to finally doing something more sophisticated than renames. Now my elsa fork is going to grow even bigger before I get commit access.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/05/11/cpp-strikes-back/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Nicely rewriting outparams</title>
		<link>http://blog.mozilla.com/tglek/2007/05/10/nicely-rewriting-outparams/</link>
		<comments>http://blog.mozilla.com/tglek/2007/05/10/nicely-rewriting-outparams/#comments</comments>
		<pubDate>Thu, 10 May 2007 22:52:07 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/05/10/nicely-rewriting-outparams/</guid>
		<description><![CDATA[Automatic code rewriting business can be a little depressing sometimes. I tend to run into funny issues caused by CPP, oink limitations or just unpleasant-to-rewrite parts of C++. After banging my head against the wall due to all these issues I finally arrived at a workable approach for the easy part of the outparam rewrite. [...]]]></description>
			<content:encoded><![CDATA[<p>Automatic code rewriting business can be a little depressing sometimes. I tend to run into funny issues caused by CPP, oink limitations or just unpleasant-to-rewrite parts of C++. After banging my head against the wall due to all these issues I finally arrived at a workable approach for the easy part of the outparam rewrite.</p>
<p><span id="more-19"></span></p>
<p>Currently I have a dehydra script that finds all non-virtual getters that return either NS_OK or *NS_SOMETHING_IS_WRONG*. The script then outputs data for squash to base the rewrites on. Then squash takes over.</p>
<p>In order to preserve sanity, pretty-printing is not used at all for rewriting the getter functions. This way one doesn&#8217;t have to worry about oink generating invalid C++ and the output is much more aesthetically pleasing. Instead, squash finds interesting expressions in the .i file. Then it extracts the corresponding strings from .h/.cpp files. The strings are used to fudge the position information obtained from the .i file to vaguely correspond to the original source files. After various C++ string-foo, squash produces a promising looking patch like <a href="http://people.mozilla.org/~tglek/outparams.May10.diff">this</a>.</p>
<p>This also relies on a fair amount of semantic information provided by elsa/oink. For example when removing a parameter, squash inserts a local variable with the same name and then removes all of the derefences of the old parameter. Since there could be multiple variables with the same name, squash relies on elsa&#8217;s variable resolution.</p>
<p>I think squash is now 50% feature complete with respect to outparam stuff. The other 50% is the hard part of rewriting all of the call-sites. I&#8217;m not counting easy parts like wrapping the return type in already_AddRefed&lt;&gt;, eliminating redundant assignments in the getter or removing the error variable once it is no longer needed.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/05/10/nicely-rewriting-outparams/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Status Update: Outparam work</title>
		<link>http://blog.mozilla.com/tglek/2007/05/07/status-update-outparam-work/</link>
		<comments>http://blog.mozilla.com/tglek/2007/05/07/status-update-outparam-work/#comments</comments>
		<pubDate>Mon, 07 May 2007 17:58:43 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/05/07/status-update-outparam-work/</guid>
		<description><![CDATA[Squash Outparams The following took me a few days to achieve. ./squash -sq-rewrite-outparams out2.txt -sq-implementation nsBidiPresUtils -sq-no-squash -o-lang GNU_Cplusplus ~/work/ff-build/dom/src/base/nsFocusController.i where out2.txt contains instructions on which functions to modify nsFocusController::GetFocusedElement,0=mCurrentElement, produces --- /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h +++ /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h @@ -72,1 +72,1 @@ - NS_IMETHOD GetFocusedElement(nsIDOMElement** aResult); + nsIDOMElement* GetFocusedElement(); This still doesn&#8217;t add the already_AddRefed or other important [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Squash Outparams</strong></p>
<p>The following took me  a few days to achieve.</p>
<p><code>./squash -sq-rewrite-outparams out2.txt -sq-implementation nsBidiPresUtils  -sq-no-squash -o-lang GNU_Cplusplus ~/work/ff-build/dom/src/base/nsFocusController.i</code></p>
<p>where out2.txt contains instructions on which functions to modify</p>
<p><code>nsFocusController::GetFocusedElement,0=mCurrentElement,</code></p>
<p>produces</p>
<p><code>--- /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h<br />
+++ /Users/tarasglek/work/mozilla/dom/src/base/nsFocusController.h<br />
@@ -72,1 +72,1 @@<br />
-  NS_IMETHOD GetFocusedElement(nsIDOMElement** aResult);<br />
+  nsIDOMElement* GetFocusedElement();<br />
</code></p>
<p>This still doesn&#8217;t add the already_AddRefed or other important attributes, but that should be easy. The result looks simple, but getting squash from working with a testcase to an actual source file was a little on the painful side.</p>
<p>After my experience with renaming I have realized that squash should avoid the C++ pretty printer for now. Thus the result is produced in a verbose AST-sensitive regexp-like way. However figuring out where things start and end is incredibly painful due to the presence of the preprocessor.</p>
<p>My plan is to get squash rewriting some basic Mozilla code the painful way and then I use what I learned to integrate <a href="http://mcpp.sourceforge.net/">mcpp</a> along with the much coveted end-of-ast-node info into elsa.</p>
<p><strong>JavaScript is an AST&#8217;s Best Friend </strong></p>
<p><span id="more-18"></span></p>
<p>Until recently I have been doing a lot of work with dehydra. Now that it is feature-complete I am back to working on squash fulltime and I miss JavaScript already. JS is much better suited for messing about with Abstract Syntax Trees. It is so nice to be able to print out any data structure, create new ones without modifying a billion files and the lack of C++ compile/linking delay is nice too. It&#8217;s amazing how much simpler it is to analyze functions for out-param rewriting in JS compared to checking for simpler patterns in C++. I am seriously excited about Tamarin/ES4 and the productivity boost that it will provide.</p>
<p>I wonder whether a complete JS binding for Elsa would be a good idea.</p>
<p><strong>Emacs</strong></p>
<p>Switching to a Mac finally made me switch to Emacs. I could not find any other editors that would be support the workflow I was used to with SciTE or Kate. Other than absolutely hating the majoring of Emacs shortcuts (who&#8217;s idea was Ctrl-^ and the crazy undo/redo) I love the editor for the amazing term mode. It&#8217;s a little buggy in the current version, but the CVS version is good enough to comfortably run vim in it for quick edits <img src='http://blog.mozilla.com/tglek/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> . It&#8217;s so nice to keep all of my terminals and code in the same window. I am dreaming of the day when Emacs will undergo the Mozilla-&gt;Firefox-like modernization.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/05/07/status-update-outparam-work/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Status Report</title>
		<link>http://blog.mozilla.com/tglek/2007/05/01/status-report/</link>
		<comments>http://blog.mozilla.com/tglek/2007/05/01/status-report/#comments</comments>
		<pubDate>Tue, 01 May 2007 22:13:01 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[DeCOMtamination]]></category>
		<category><![CDATA[dehydra]]></category>
		<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/05/01/status-report/</guid>
		<description><![CDATA[Automated Analyses and Rewrites Dehydra and Squash are now mature enough to assist with mundane tasks like renames and various kinds of tedious code inspection. If you ever suspect that part of the Mozilla hacking you are doing could be done by a tool, contact me to see if I have a suitable tool for [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Automated Analyses and Rewrites</strong></p>
<p>Dehydra and Squash are now mature enough to assist with mundane tasks like renames and various kinds of tedious code inspection. If you ever suspect that part of the Mozilla hacking you are doing could be done by a tool, contact me to see if I have a suitable tool for you.</p>
<p>Also, these tools are in no way limited to working with Mozilla source code. I would be happy to see people use them for other projects too.</p>
<p><strong>Short-term Plans</strong></p>
<p>For the next week or two I plan to focus on out-parameter rewriting and the Mozilla-wide C++ callgraph.</p>
<p><strong>Mozilla-wide Callgraph</strong><br />
This is proving to be a little painful. Things work for basic test-cases, but I am running into scalability issues with Mozilla (as expected). My current approach of serializing everything into a giant JSON graph blows the 32bit address space after a few hundred files. Even doing a Mozilla-wide inheritance graph causes out of memory errors, but that runs almost to competition. The best solution to this will be to break up the graph into as many smaller JSON files as possible and only load ones that are absolutely required into memory.</p>
<p>The callgraph will be a useful starting point for many other useful analyses (dead code one is going to be lots of fun) and it&#8217;s a good test of dehydra&#8217;s scalability, but I have suspended work on it for a few days to focus on more productive tasks.</p>
<p><strong>Out-parameter Rewriting</strong></p>
<p>Due to XPCOM, Mozilla getters typically return an error code and a value via an out parameter. This requires checking the error code and likely propagating it at the callsite.</p>
<p>For many places in the code there are performance and aesthetical reasons to stop using error codes. Brendan talks discusses some reasons <a href="http://weblogs.mozillazine.org/roadmap/archives/2006/10/mozilla_2.html">here</a>. This would be cool stuff, but switching to exceptions isn&#8217;t going to happen right away. However, I can already start working on my tools to assist with simpler cases (like <a href="http://www.google.com/codesearch?hl=en&amp;q=+nsBidiPresUtils::GetBidiEngine+show:kW5gzhxHJvU:iRn_dfI9xJY:Y5Gdp6b4NgE&amp;sa=N&amp;cd=1&amp;ct=rc&amp;cs_p=http://ftp.mozilla.org/pub/mozilla.org/mozilla/releases/mozilla1.7a/src/mozilla-source-1.7a.tar.bz2&amp;cs_f=mozilla/layout/base/src/nsBidiPresUtils.cpp#a0">nsBidiPresUtils::GetBidiEngine</a>?). I&#8217;m focusing on getters that return NS_OK/(some error) and a value and rewriting them to return NULL on error and non-NULL on success. This could be ready in time for Firefox 3. Once I&#8217;m done with the tool, I&#8217;ll just need someone to help me figure which functions are ok to simplify like that.</p>
<p>I suspended work on out-param rewriting <a href="http://blog.mozilla.com/tglek/2007/01/11/squash-progress-and-plans/">some time ago</a>. It was proving to be too complicated to do within squash. Now that I can use dehydra to verify the control flow graph, things are a lot simpler. Current plan is to have the <a href="http://people.mozilla.org/~tglek/outparams.js">dehydra script</a> produce a list of candidates for out-param surgery and have squash consume that list and produce the appropriate patches. Currently, the script works for some very simple cases and I am working on the squash side.</p>
<p><strong>Smaller Tasks</strong></p>
<ul>
<li>Sayrer&#8217;s uninitialized member analysis: added more complete constructor support to dehyra, wrote a <a href="http://people.mozilla.org/~tglek/member-init.js">sample script</a> to get sayrer started. Fixed dehydra&#8217;s 64bit support. <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=378763">Bug 378763</a></li>
<li>Made some squash-generated patches for bz, helped me find a bug in squash. <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=378780">Bug 378780</a></li>
<li>Pushing squash upstream into oink. This is time consuming because it is a combination of legal and many minor technical issues. Dehydra will follow later.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/05/01/status-report/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Automated Code Refactoring</title>
		<link>http://blog.mozilla.com/tglek/2007/04/02/automated-code-refactoring/</link>
		<comments>http://blog.mozilla.com/tglek/2007/04/02/automated-code-refactoring/#comments</comments>
		<pubDate>Mon, 02 Apr 2007 19:29:12 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[DeCOMtamination]]></category>
		<category><![CDATA[dehydra]]></category>
		<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/04/02/automated-code-refactoring/</guid>
		<description><![CDATA[Squash If you are working on any C++ refactoring, especially if it involves function calls, spans multiple files or feels like you need a compiler in your head to help you, drop me a note to see if squash can help. Squash provides a great deal of control over the refactoring process because it is [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Squash </strong></p>
<p>If you are working on any C++ refactoring, especially if it involves function calls, spans multiple files or feels like you need a compiler in your head to help you, drop me a note to see if <a href="http://wiki.mozilla.org/Squash">squash</a> can help. Squash provides a great deal of control over the refactoring process because it is not tied to a particular IDE and can be customized to accommodate for special cases.</p>
<p>On Friday, two squash-produced patches landed:</p>
<ol>
<li>A <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=376042">212K patch</a> to rename nsIFrame::GetPresContext to PresContext. It took a couple of minutes to produce a patch for mac &amp; linux, and then some manual labour to complete it so it builds on Windows too. Unfortunately, Microsoft C++ is not yet supported by Oink. Windows-specific code will require magnitudes more of human labour until such support is contributed.</li>
<li>A much simpler patch to  calls to <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=375878">remove uses of the deprecated ::Recycle()</a>. This took a few minutes once I added support for renaming global functions to squash.</li>
</ol>
<p><strong>Dehydra</strong></p>
<p>C++ support in dehydra is coming along splendidly. I started working on cross-function analysis support. Currently my goal is to allow the user to build callgraphs of Mozilla. The first application of that is going to be dead code detection.</p>
<p>In the meantime, contact me if you are looking for patterns in the code that grep wont help with : control flow-sensitive code, type &amp; syntax-aware  matching, API misuse, etc. Dehydra can probably help.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/04/02/automated-code-refactoring/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>WebSquash</title>
		<link>http://blog.mozilla.com/tglek/2007/02/28/websquash/</link>
		<comments>http://blog.mozilla.com/tglek/2007/02/28/websquash/#comments</comments>
		<pubDate>Wed, 28 Feb 2007 21:20:15 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[DeCOMtamination]]></category>
		<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/02/28/websquash/</guid>
		<description><![CDATA[Looking for developers to test the web frontend for squash I got the web frontend to squash working. Right now I&#8217;m looking for people to test it on my test server before I open it to the wild web. It ended up in a further frontend script explosion, but all of the pieces seem to [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Looking for developers to test the web frontend  for squash</strong><br />
I got the web frontend to squash working. Right now I&#8217;m looking for people to test it on my test server before I open it to the wild web. It ended up in a further frontend script explosion, but all of the pieces seem to make sense. As it stands right now there are 5 pieces:</p>
<ol>
<li>JavaScript client-side provides progress notification</li>
<li>A PHP frontend to communicate with the stateful server</li>
<li>Python server that handles command queuing, progress reporting and error handling</li>
<li>Python library to build a list of possible candidates for squashing, produce the necessary .i files and an invocation command from squash</li>
<li>Squash: the friendly neighborhood class member renamer</li>
</ol>
<p><strong>Passion of CPP: Macros are Considered Painful<br />
</strong></p>
<p>In the process of testing the web frontend I updated the Mozilla sourcecode only to notice that Elsa can no longer parse files for tasks that worked before. At first I got a little discouraged thinking that I&#8217;ll have to teach Elkhound about yet another obscure C++ feature that wasn&#8217;t handled correctly before. However, turned out that in one case I was feeding squash a file that didn&#8217;t even compile and in the other 2 cases CPP was messing with my head.</p>
<p>The first case was the magic of CPP leading to unintentional code duplication and squash confusion:  <code>PR_MAX(GetPresContext()-&gt;PointsToAppUnits(0.5f), onePixel)</code><br />
gets expanded and parsed as<br />
<code>GetPresContext()-&gt;PointsToAppUnits(0.5f) ? GetPresContext()-&gt;PointsToAppUnits(0.5f) : onePixel</code></p>
<p>I ended up putting in a special case teaching squash to not get upset if it can only find one of the two instances of class member to replace when PR_MAX is involved.</p>
<p>The second case was exciting. In my innocent perception of CPP wonder I thought that running g++ on a .cpp or a .i file produced from the said .cpp would result in pretty similar behavior. Not so.</p>
<p><code>PR_LOG(gLog, PR_LOG_DEBUG,<br />
("xul: %.5d. %s    %s=%s",<br />
-1, // XXX pass in line number<br />
NS_ConvertUTF16toUTF8(extraWhiteSpace).get(),<br />
NS_ConvertUTF16toUTF8(qnameC).get(),<br />
NS_ConvertUTF16toUTF8(valueC).get()));<br />
</code><br />
yields</p>
<p><code>do { if (((gLog)-&gt;level &gt;= (PR_LOG_DEBUG))) { PR_LogPrint ("xul: %.5d. %s    %s=%s", -1, // XXX pass in line number NS_ConvertUTF16toUTF8(extraWhiteSpace).get(), NS_ConvertUTF16toUTF8(qnameC).get(), NS_ConvertUTF16toUTF8(valueC).get()); } } while (0);</code></p>
<p>Here the // comment ends up being promoted to being inside a line due to PR_LOG contracting and the resulting line won&#8217;t parse since half of it is commented out.</p>
<p>This kind of CPP mischief leads me to believe that something has got to give. If we are to embrace automated tools to aid in verification and development either CPP use has to be reduced considerably or Elsa needs to get a builtin preprocessor. I suspect the solution to this will involve a mixture of the two approaches.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/02/28/websquash/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Will Rename Class Members for Food</title>
		<link>http://blog.mozilla.com/tglek/2007/01/24/will-rename-class-members-for-food/</link>
		<comments>http://blog.mozilla.com/tglek/2007/01/24/will-rename-class-members-for-food/#comments</comments>
		<pubDate>Thu, 25 Jan 2007 02:04:39 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[DeCOMtamination]]></category>
		<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/01/24/will-rename-class-members-for-food/</guid>
		<description><![CDATA[Squash may now be ready as a class member renaming tool for early adopters. I would like people to use me as a frontend to squash. Email me your requests for renames and I will reply with giant patches. This way squash can be immediately useful. Plus I can fix bugs in squash and figure [...]]]></description>
			<content:encoded><![CDATA[<p><title></title> 	 	 	 	 	 	<!-- 		@page { size: 21.59cm 27.94cm; margin: 2cm } 		P { margin-bottom: 0.21cm } 	--></p>
<p>Squash may now be ready as a class member renaming tool for early adopters. I would like people to use me as a frontend to squash. Email me your requests for renames and I will reply with giant patches. This way squash can be immediately useful. Plus I can fix bugs in squash and figure out actual usecase while I get the frontend set up.Progress</p>
<p>Squash can now produce a good looking <a href="http://glek.net:8080/~taras/nsiframe.diff">92K patch</a> for renaming nsIFrame::GetPresContext. This means that squash can now correctly traverse 167 files and produce a patch that affects 103 of them. I am going to work on the web frontend next.</p>
<p>Some issues below.</p>
<p><span id="more-8"></span><br />
<strong>Tool Explosion</strong></p>
<p>I wrote a  creatively named  frontend: run_squash. It prevents squash from running out of address space by running squash unit-at-a-time and combining patch output from multiple runs. It runs squash in parallel similar to make -j. This decreases runtime proportionally with the number of cores.<br />
I would be curious to see how Sun&#8217;s Rock-based systems fare for this. For example on the 4way Opteron a 20 minute squash run takes around 5 minutes. Having lots of CPU cores will become important down the road once multiple users are running multiple analysis tasks on a single machine through a web frontend.</p>
<p>There is another temporarly named tool, prepare.py, which greps .cpp files looking for candidates for renaming, produces matching .i files, figures out number of cpus and then invokes run_squash.</p>
<p>More special purpose tools will need to be written. For example roc mentioned that it would be nice to check when generated interface files are being modified and to have the corresponding IDL updated instead while refusing to modify frozen IDL interfaces. Classes with IIDs would need to have them changed too.</p>
<p><em>Tool Rant</em><br />
I am trying to decide how to manage the tool growth such that things evolve sanely. Should squash be a giant swiss-army-knife binary capable of doing everything but incredibly hard to modify? Or should it be broken up into a dozen of separate programs &amp; scripts that work together in an ad-hoc way?  The latter would be what some people describe as UNIX way, whatever that means. In the 90s, the former would&#8217;ve been done by making everything a COM component and still just as hard to modify. Alternatively, I could use some strong ROPE and SOAP to tie everything together with SOA. Kidding aside, it would be nice to have a strategy to deal with this so people could run squash on their own machines too without spending a week setting up dependencies.</p>
<p><strong>CPP – To Invert or Not to Invert</strong></p>
<p>A large part of squash is hacks and workarounds for preprocessor-induced pain. Recently, I ran into two interesting cases where blind string substitution fails.</p>
<p><em>Multiline macro parameters</em></p>
<p><code>if (NS_SUCCEEDED(<br />
nsSVGUtils::GetReferencedFrame(&amp;nextPattern, targetURI,<br />
mContent,<br />
GetPresContext()-&gt;PresShell()))) {<br />
</code><br />
Squash works on an AST produced from .i files. When cpp expands this macro, everything ends up on the same line as the if keyword. That&#8217;s a problem because when squash wants to replace  GetPresContext() the parser gives it the wrong line. Initially I was tempted to remove the newline in a few cases from the original source, but then I realized that this analogous to looking for the closing } in class declarations when end-of-AST-node info isn&#8217;t available. Now if a string substitution fails squash will go to the next line until a match is found or one of ;{} characters is encountered. Yes, that&#8217;s also an emoticon of me looking at yet another CPP-induced problem.</p>
<p><em>Replacing Code Within a Macro Definition.<br />
</em>Mozilla has whole functions defined within macros. Squash can not deal with that. Is it worth it to do a limited preprocessor inversion to fix these or to fail and let a developer do it by hand? For example one could mark up the .i file with metadata on which sections came from a macro expansion and which sections were passed into macros as parameters. Then an intelligent rewrite decision could be made. This would solve most of the CPP-induced problems I can think of, but would require hacking a good C preprocessor. This is probably not worth the pain, but apparently someone bolted on a bsd-licensed preprocessor onto Elsa. If that&#8217;s true, the patch may get me most of the way to a macro-aware squash.</p>
<p><strong>Quick Analyses</strong></p>
<p>There are hundreds of useful analyses that can be done on Elsa&#8217;s AST of Mozilla. However using C++ is too verbose and error prone do many of the simpler tasks. It would be nice to make a first-class binding to Oink/Elsa that easily extract all interesting info from the AST. Olmar is one way of doing that, but converting Elsa-style OO into ML datatypes produces incredibly verbose data structures and requires careful conversion of code such that information isn&#8217;t lost.</p>
<p>A language that can query the C++ AST in the way that the author&#8217;s intended it, may be an easier way to go. ES4 JavaScript would be interesting for that because the script could traverse the cast-happy Elsa-AST, extract information of interested and present that as a structured type which could then be manipulated using type-safe pattern matching.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/01/24/will-rename-class-members-for-food/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Squash Progress and Plans</title>
		<link>http://blog.mozilla.com/tglek/2007/01/11/squash-progress-and-plans/</link>
		<comments>http://blog.mozilla.com/tglek/2007/01/11/squash-progress-and-plans/#comments</comments>
		<pubDate>Thu, 11 Jan 2007 22:34:59 +0000</pubDate>
		<dc:creator>tglek</dc:creator>
				<category><![CDATA[DeCOMtamination]]></category>
		<category><![CDATA[squash]]></category>

		<guid isPermaLink="false">http://blog.mozilla.com/tglek/2007/01/11/squash-progress-and-plans/</guid>
		<description><![CDATA[Out-param Rewriting Work Since the last post I worked on rewriting functions that use out-parameters to use return values instead. I got as far as rewriting method definitions and simple call sites, but decided to hold off further work until the rest of squash is more complete. Squash Development Roadmap Robert O&#8217;Callahan helped me devise [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Out-param Rewriting Work</strong></p>
<p>Since the last post I worked on rewriting functions that use out-parameters to use return values instead. I got as far as rewriting method definitions and simple call sites, but decided to hold off further work until the rest of squash is more complete.</p>
<p><strong>Squash Development Roadmap</strong><br />
<a href="http://weblogs.mozillazine.org/roc/" title="Well, I'm Back">Robert O&#8217;Callahan</a> helped me devise a near term roadmap. I am going to focus getting squash to be production quality for member renames and to produce commit-quality patches. An example query would be to rename sIFrame::GetPresContext to nsIFrame::PresContext. This involves a couple of big details:</p>
<ul>
<li>Produce aesthetically pleasing code via text substitution instead of oink pretty printing. The advantage of this is that the original coding style, comments and indentation will all be preserved. This involves reparsing the resulting code to verify correctness (doubles-memory usage &amp; processing time).</li>
<li>To produce a complete patch squash needs to process all of the relevant source code. This increases memory usage and processing time linearly. I&#8217;ll use grep to narrow down candidates for processing and in the future will use a AST database of mozilla to figure out exactly what needs changing.</li>
<li>It is useful to be able to process all interesting source code in one invocation but just processing the layout/generic directory sequentially uses over 2GB of RAM (Elsa&#8217;s AST does not support deallocation) and takes 3 minutes on a quad Opteron. So in order to reduce RAM usage and be a trendy multi-core developer I&#8217;ll fork() a process for every file and use that for both parallelism and memory cleanup purposes.</li>
<li>Develop a web frontend that maintains an up-to-date mozilla source tree and has squash setup on it where one would be able to enter their rename operation and have patch emailed back to them. Rob even had a cool idea to have the user enter a bugzilla id and have the patch automatically attached to that. This will be useful so I don&#8217;t have to work so hard on packaging squash and users will get instant gratification. Plus people without quad Opterons will be able to test squash too <img src='http://blog.mozilla.com/tglek/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </li>
</ul>
<p>All that is Milestone 1. After that I&#8217;ll work on infrastructure like AST-node-location info, cleaning up pretty printing and defining the exact goal for the next milestone.</p>
<p><strong>Current Status</strong></p>
<p>Over the past 3 days I refactored squash to be able to do renames without having to go through class squashing, etc. I added the ability to rename class members and now it can produce ugly patches for that.</p>
<p>The current workflow to rename nsIFrame::GetPresContext to nsIFrame::PresContext is:</p>
<ol>
<li>Identify possible targets<br />
<code>find ~/work/ff-build -name \*.o |xargs grep nsIFrame &gt; /tmp/output.sh</code></li>
<li>My sed is rusty so I used regexps in <a href="http://kate-editor.org/" title="Best code editor ever!">Kate</a> to convert resulting lines into something like<br />
<code>make -C ./layout/generic/ nsSpacerFrame.i<br />
make -C ./layout/generic/ nsFrameSetFrame.i<br />
make -C ./layout/generic/ nsBlockFrame.i</code></li>
<li>Run the script to produce the needed .i files<br />
<code>. /tmp/output.sh</code></li>
<li>Grand-finale:<br />
<code>find ~/work/ff-build/ -name \*.i |time xargs  ./squash -o-lang GNU_Cplusplus  -sq-implementation nsIFrame  -sq-no-squash -sq-rename-member GetPresContext PresContext &gt; <a href="http://glek.net:8080/~taras/nsiframe.diff">nsiframe.diff</a><br />
</code>Note that find outputs absolutely filenames which is essensial for squash to resolve relative include files.</li>
</ol>
<p>The setup and squashing itself is a bit laborious and RAM/CPU intensive and is the reason for a web frontend. I am going to be ecstatic once this all works.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.mozilla.com/tglek/2007/01/11/squash-progress-and-plans/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

