MUST_FLOW_THROUGH(”label”)
September 8th, 2008
Some time ago, Igor mentioned that there is code in SpiderMonkey that pleads to the programmer that from a certain point in a function code must flow through a label(ie a finalizer block). Treehydra made it to possible to turn that weak plea into an error message when static checking is enabled. See the bug for more details. My favourite static analyses are all about turning informal “gurantees” into angry compiler complaints.
This is my first static analysis that landed in the mozilla-central tree. It’s also the simplest one and may be a decent starting point for solving similar problems. I’d be cool to see this particular feature utilized outside of SpiderMonkey. Unlike human-powered code-inspection, it excels at finding accidental early returns covered up by macros.
Status Report: Nearterm plans for Pork, Dehydra
June 24th, 2008
I planned to release Pork 1.0 for a while now. The tools work great, even if all the love is going to the GCC-based toolchain. However, after hearing grumpy comments from a certain coworker about the uglyness of the oink build system it dawned on me that it’s rather mean to release such a mess and call it 1.0.
So I think I’ll release Pork 0.9 in the current state, so I can focus on near term GCC toolchain work. Pork in the current form means oink stack + my refactoring tools + changes to elsa and other libs to support C/C++ refactoring needs.
This will be followed up by Pork 1.0. 1.0 will involve changes to the build system to get rid of oink(we only use the oink build system and rarely use oink API). To put this another way: I don’t expect any functionality changes between 0.9 and 1.0 other than an improved build system to make it easier to get started with writing new tools.
Pork - Future
I am pretty happy with Pork as it is. I think we’ve taken Elsa as far as it’ll let us go. The only realistic improvement on the Pork side may be to have Dehydra generate a JS binding to Elsa’s extensive AST to make rewriting stuff easier. However, I’m not sure if that’s worth the effort nor that a C++ AST will reflect into JavaScript as well as GCC GIMPLE.
Preprocessing
On the other hand, something needs to be done about the main ingradient that makes Pork tick: MCPP. MCPP does a lovely job of annotating what the C preprocessor is doing, but configuring GCC to use a foreign preprocessor is a giant hassle and making sure it works correctly is troublesome. At the GCC summit, Tom gave me an idea on how similar functionality can be added to GCC directly by extending the include backtrace with macro expansions. Not only would such integration simplify Pork setup and increase Pork’s operating speed, but it is also a clean way to expose preprocessor constructs to the AST presented in De/Treehydra. It should allow for more preprocessor awareness directly in analysis stage of refactoring instead of only in the final rewriting stage as is currently done. As a side-effect, GCC would gain better error messages too.
So while this isn’t going to affect Pork directly, it will simplify the lives of Pork users while opening new analysis frontiers. Even though I hate working on preprocessor stuff, I think this work will need to happen sometime in the near future.
Dehydra 0.9 has been out for a while, I planned to release 1.0 soon after unless there are major flaws discovered in the API. The situation changed at the GCC summit. The fact that FSF reversed their stance on GCC plugins means that we should be concentrating on getting the plugin stuff reviewed.
So in the near term I’m forward porting the plugin stuff to trunk GCC, then I’ll be generalize the plugin API to suit at least one other GCC plugin user that we met with at the summit. The downside is that I don’t want to release Dehydra 1.0 and immediately break the plugin API. The upside is that the new API should be more general and more minimalistic and will likely be close to what will eventually become an official plugin API.
Summary: In my mind Dehydra and Pork are 1.0 quality, but I want to future-proof them a little bit before calling them 1.0.
GCC Summit
June 19th, 2008
Our presentation on Treehydra and Dehydra GCC plugins was received well at the summit.
The big news is that FSF is working on license changes to allow GPL-only GCC plugins. I’m looking forward to having our work be compatible with future GCC without any patching.
In a few minutes we’ll be having a meeting with users of other plugin frameworks to have an initial discussion on a common API. I’m working on forward porting my patches, so they can start getting reviewed ahead of license changes.
Dehydra 0.9: It’s alive!
June 9th, 2008
I am finally happy enough with Dehydra API and functionality to release 0.9. Dehydra is basically feature complete, the main reason I’m not calling it 1.0 is in case there are outstanding API bugs.
I believe Dehydra is the first useful open source static analysis tool. I hope to see projects outside of Mozilla benefitting from it too.
I would love to see someone package this up for various Linux distributions. You can grab there release here.
Note, this release also features as a preview release of Treehydra. Most of the development lately has been focused on improving Treehydra and building analyses on top of it.
Treehydra goes Push and Pop
May 27th, 2008
After writing a ton of docs and working through other Dehydra 0.9 blockers, I decided to cool off by doing some actual analyses. Before I get to that, I’d like to say that the last big task is to setup a buildbot for Dehydra on Linux/OSX. Thanks to yet another awesome contribution from Vlad, that’s mostly done.
So I got working on GC-safety static analysis. Originally we tried to define a complete spec before writing a single line of code. That turned to be a bad idea and resulted in a spec full of bugs. This time we are defining the analysis incrementally and as a surprise reward, it already caught a bug.
Pushing and Popping Our Way
SpiderMonkey has a lot of complex code doing applying Push/Pop-like operations on variables in a function-local manner. Examples of functions that this analysis would look at are: JS_PUSH_TEMP_ROOT/JS_POP_TEMP_ROOT and JS_LOCK/JS_UNLOCK. See bug for more. Essentially, this will help with “code must flow through here” comments on “out:” goto labels that inhabit the SpiderMonkey source.
This is an example of control-flow-sensitive analysis. It impossible without a compiler-like view of the code that Treehydra provides. It also helps to have a scalable algorithm to iterate the CFG. Luckily, David Mandelin wrote such a beast by implementing ESP for his outparam analysis. David factored-out the ESP analysis and made it available for reuse. See esp_lock.js in the test suite for an example of how to write control-flow sensitive analyses. locks_valid*.cc and locks_bad*.cc illustrate the code patterns that can be scanned for.
So if you know of any further push/pop patterns in the rest of Moz that can be checked in this manner, leave a comment.
PS. This is yet another account of Treehydra rocking the static analysis world. Exposing the slightly scary, but awesome GCC gutts via JavaScript allows one to perform precise static analyses in a civilized manner. What could be more fun?
Counting down to a Dehydra release
April 29th, 2008
I hope to release Dehydra 0.9 within a couple of weeks. There is already a community of users, but there are still too many barriers to entry keeping potential bug hunters away.
In recent weeks there has been a lot of work on polishing rough areas. Now we have better error reporting, improved APIs for using libraries, etc. The remaining tasks are tracked in this bug.
There few big remaining TODOs are low-tech:
- Need a better homepage than the current one.
- Docs, tutorials and more docs. Currently, the plan is to puts more documentation on MDC and have it also serve as a webpage. Any dehydra/treehydra guides or API doc contributions are welcome. For now if you need help, feel free to ask on the mailing list or #mmgc on irc.mozilla.org
- Verify, document and maintain the OSX port. Vlad Sukhoy did a lot of heavy lifting to make this happen, now we need to cement his achievement by setting up a buildbot
- Spread the word! I would like to see other large projects such as KDE, OpenOffice, etc adopt application-specific static analysis in the form of *hydra. I am interested in seeing people use *hydra to scan code for security vunerabilities. Ok, so this isn’t really needed to release Dehydra 0.9, but I am impatient!
RIP: Oink Dehydra
Between GCC Dehydra and Treehydra, there is nothing that pork Dehydra could do better, so I finally removed Dehydra from Pork. From now on Pork’s purpose is large-scale C/C++ refactoring. For everything else one should use Dehydra.
Dehydra World Tour
March 17th, 2008
After a few weeks of mindnumbing work on treehydra gutts, I finally have something exciting to talk about!
We will be presenting Dehydra at the GCC Developer’s Summit in lovely Ottawa. The GCC version of Dehydra exceeded all of my expectations, so it will be exciting to meet awesome GCC hackers who lay the groundwork to make this possible. Got suggestions for other venues to present Dehydra?
Packaging Help Needed
I feel that the Dehydra concept is getting mature enough for a 1.0 release. Recently baked GCC 4.3 means I’ll be able to distribute a 4.3-specific plugin patch(currently it’s against trunk, aka 4.4to-be). Now I need README, LICENSE, configure files, etc.
I will need help with packaging dehydra + patched gcc into .dpkg and .rpm files. Leave a comment, email me/static analysis list or poke me in #mmgc on irc.mozilla.org if you can help with packaging.
Logo/Mascot Wanted
Since every serious project has a cool mascot, it would be cool to get one for Dehydra. I’d be curious to see what people think could symbolize a code scanning monster that makes grep feel inadequate. I have a feeling a cartoon version of a giant Heavy Metal Duck might be it, but I haven’t made up my mind yet.
Treehydra What?
Treehydra is a work-in-progress name for the low-level equivalent of Dehydra. Currently it is built as separate GCC plugin. I haven’t yet made up mind on whether Treehydra will end up extending Dehydra or stay a separate tool. Since treehydra needs dehydra for bootstrap, they’ll stay separate for now.
Last week I managed to run treehydra to completition on my mozilla checkout and walk the resulting AST in JS correctly. Now comes the fun part of making it do useful tricks.
Recipe: How many classes are instantiated in Mozilla?
March 12th, 2008
I got this question in the mail today.
Seems like a simple enough question, but grep won’t provide that answer
It also happens to be an excellent usecase for Dehydra.
My script:
var classes = []
function process_type (c) {
if (!/class|struct/(c.kind)) return
classes.push (c.name)
}
function input_end() {
var f = this.aux_base_name + ".counter"
print(f)
write_file (f, classes.join ("\n"))
}
process_type is called every time GCC hits a class declaration or a template is instantiated(also for enums and unions, but those get ignored with the .kind check). Then input_end is called when GCC is done processing the file. this.aux_base_name is the input filename.
I hooked up this script to the mozilla build by adding the following to .mozconfig:
export CXX=$HOME/gcc/bin/g++
export CXXFLAGS="-fplugin=$HOME/work/gccplugin/gcc_dehydra.so -fplugin-arg=$HOME/work/gccplugin/test/count_classes.js"
Then I built:
make -f client.mk build WARNINGS_AS_ERRORS=
Count:
find -name \*.counter|xargs cat |sort |uniq > /tmp/classes.txt
wc /tmp/classes.txt
Answer: 15001
There are a million other trivial queries that could be accomplished in a similar manner that weren’t easy or possible before.
Update: Fixed typo, had an extra zero in the answer
Dehydra progress
January 25th, 2008
GCC Dehydra is evolving much faster than the Elsa version did and it is easier to use. Once I implemented virtual methods correctly, Joshua was able to do his thing in no time at all. All it takes is a custom GCC (I’d love to see it packaged) and specifying plugin parameters in CXXFLAGS.
Dehydra has some new tricks now like a tree representation of types (instead of a string) with full typedef support. Lisp remnants in GCC are getting a new life as JavaScript objects.
I’m current working on exposing the full GCC tree structure in JavaScript so one could do any analysis they wanted in pure JS. Dynamically typed GCC tree nodes are great for that. I’m starting with middle-end GIMPLE representation so in theory one will be able to analyze anything gcc can compile (Java, C++, C, ObjC, ObjC++, FORTRAN?). Eventually this will be expanded to support frontend specific tree nodes to be able to look at code closer to the way it was written. Oh and I expect people will be able to script large parts of C++ -> JavaScript rewrites with Dehydra.
In theory, one could make tree node conversion two way which would enable writing optimization passes in JS, but that would be silly.
What’s the point?
I want to be able to do Exception-safety analysis in pure JS. I want to enable unit checking (thought typedefs and inline conversion functions) in pure JS.
Additionally, Dehydra should be awesome for generating bindings. For example, I’ll be able use Dehydra to import GCC’s autogenerated enums to get string names for nodes.
Also it will become easy to extract callgraphs and various other stats out of the code if they are accessible in JS. Eventually we’ll be switching Dehydra to Tamarin to do all of the above really really fast.
GCC Plugins
While I am messing with the GCC AST, Dave is working on utilizing GCC’s control flow graphs with a separate plugin. Eventually we’ll merge our work, but for now it’s nice to not step on each others toes while adding features to the compiler. Given how easy life is with plugins I am amazed that people chose to go uphill bothways and not collaborate on a plugin interface for their crazy GCC extensions. Yes, I’m looking at you: mygcc and gccxml.
Aren’t there IDEs interested in making use of GCC internals too or is everybody interested in maintaining yet another crappy C parser like Linux’s Sparse tool?
I’m looking forward to exploring the many ways we can reuse what’s in the compiler to empower developers for Mozilla 2.
GCC + SpiderMonkey = GCC Dehydra
January 17th, 2008
Analysis
GCC Dehydra is starting to work. I encourage people try it out for their code scanning needs. The main missing feature is control-flow-sensitive traversal, which means that currently function bodies are traversed represented in a sequential fashion. It is the most complicated part of Dehydra, but most of the time this feature is not needed.
So far I got Benjamin’s stack-nsCOMPtr finding script to do stuff, which indicates that most of the features are working.
My vision is to switch to the GCC backend for all of our code analysis needs since it is well tested, fairly feature complete works with new versions of GCC (by definition).
Not everything is perfect in GCC land. There are some frustrating typedef issues to solve.
Source Re-factoring
Elsa still holds its own when it comes to refactoring code because it has a much cleaner lexer/parser and rarely opts to “optimize away” original AST structure. We should stick with Elsa’s arcane requirement of having to preprocess files with gcc <= 3.4 until either GCC becomes viable as a platform for refactoring or clang matures.
GCC is not suitable for refactoring work because it:
- Starts simplifying the AST too early
- The parser is handwritten and therefore would be hard to modify to maintain end-of-AST-node location info.
- GCC reuses many AST nodes which means their locations point at the declaration rather than usage-point.
- Handwritten nature of GCC makes any of these above improvements time-consuming to implement and the political issues are something I’d rather not deal with.
Most of these wouldn’t have been an issue if GCC was written in ML ![]()
What’s Next?
Time to start using GCC Dehydra to enforce GC-safety and lots of fun exception-rewrite preparation work.
Stay tuned for more exciting developments regarding regaining control over source code here and on Dave Mandelin’s blog.