Rewriting JavaScript
October 24th, 2007
I’m having a very painful time rewriting Mozilla source code to switch it to garbage collection. So I took a little break to think about Myk’s JavaScript rewriting idea.Then I found excellent info on parsing JS with SpiderMonkey. The downside to SpiderMonkey is that it is in C and thus hard to reuse, the upside is that will be able to parse all valid JavaScript.
I think I shall find some free time to prototype a little tool to take a .js file and produce a JSON representation of it such that one could write transformation passes in JavaScript. This will be useful as it will complete DeHydra by enabling it to process JS in addition to C++ and it will save a lot of time for various JavaScript refactorings.
I expect developing a JS refactoring tool to be relatively trivial compared to refactoring C++ as there is no source-code mangling going on due to lack of preprocessing (but JS is dynamic and can be embedded in various document types, so that could complicate things). Perhaps it would even be of use to mozpad people.
Looking forward to playing more with this idea.
October 24th, 2007 at 10:27 pm
No JS preprocessing?
http://mxr.mozilla.org/seamonkey/search?string=ifdef&find=\.js&tree=seamonkey
Not anyway as bad as our C++, of course.
October 25th, 2007 at 12:12 am
To learn a little about the Elsa backend stuff, I wrote a small Lua parser using Flex/Elkhound/astgen. (Well, it doesn’t build an AST yet because I fell ill with the cold from hell, but that shouldn’t be too hard to knock out once my head isn’t filled with cotton. It spits out parse trees like there’s no tomorrow, though.)
My goal with this is to ultimately build a JavaScript parser for the very reason you mentioned. To be able to run DeHydra on it seamlessly.
You using SpiderMonkey is one of those “I wish I’d thought of that” moments. You could even bridge the JSON output through astgen so you needn’t do too much to DeHydra in terms of how it interfaces to the representation of the underlying program you’re rewriting (essentially replacing Flex/Elkhound with SpiderMonkey itself, astgen and some glue).
I’m personally excited about the possibilities of extending all this to a toolkit for rewriting any language you can write a “plug-in” for. Possibly, source-to-source transformations between languages (many languages are, on some level, isomorphic). C++ to JS2 would be cool.
October 25th, 2007 at 12:13 am
I think we actually feed most of our js through the xul preprocessor now. I guess most of it just passes through unchanged, but some of it is actually modified. Not as bad as C++ is, of course.
http://mxr.mozilla.org/mozilla/search?string=%23if&find=%5C.js%24&findi=%5C.xul%24&filter=&tree=mozilla has examples.
October 25th, 2007 at 7:55 am
I hope you know that there is a JS JS parser (narcissus) in the tree? And there’s Rhino, of course, and I was told ages ago that the Parrot people were “working on it” as far as JS support was concerned. So maybe those would be easier to use than Spidermonkey?
October 25th, 2007 at 8:01 am
fredrik,
DeHydra actually doesn’t use the astgen stuff because it’s way too low level. I think writing a new JS parser in C++ doesn’t really make sense for a few reasons:
a) SpiderMonkey is the “standard” parser that you’d have to play catchup with every time someone adds a feature(or a bug) to the language. I’m in that boat with elsa and C++ right now, it sucks.
b) There is Esc a new ecmascript compiler written in JS. Using JSON as an intermediate format should make it fairly simple to switch to a pure-JS solution as soon as that becomes available.
We should talk about your work some more, sounds cool.
Justin & Axel, thanks for the heads up. As long as it’s just simple conditionals, we should be able to pretend that there is no preprocessor. Please tell me that there aren’t any function-like macros in this. In the worst case I can reuse my MCPP work on this.
Gijs, as I said to fredrik, it’s nice to use the “standard” even if it is a little painful.
October 25th, 2007 at 9:57 am
But DeHydra does consume the AST that Elsa/Oink spits out on some level, no? Through a syntax tree visitor (JSVisitor IIRC). That’s what I gathered from browsing the code at least.
I just figured that if that was the common ground, having DeHydra just needing to understand the AST format produced by astgen, it would make things easier (barring AST node name problems). You could still use SpiderMonkey, have it output the JSON and then build the AST from that (or just build it directly). It’s entirely possible I’m wrong.
You’re correct re: the catch-up problems with SpiderMonkey, and that was one of my major concerns. It’s probably thankless to use Elkhound/astgen for JavaScript, both short- and long-term now that I think about it. Saves me wasting time on it
.
October 25th, 2007 at 10:29 am
fredrik,
Yes dehydra tries to simplify the ast as much as possible before passing it to js. It just so happens that it starts from an Elsa AST.
In reality to be compatible with DeHydra, one can just provide the standard API and call the same callbacks that DeHydra scripts expect from any other JS program(including a webpage?). Thus JSON is a natural choice since one can write the simplification step in JS to keep life really simple
Any chance of you wanting to play with the easier approach?
October 26th, 2007 at 12:55 am
Yeah, it sounds like a lot of fun.