Main menu:

Site search

Categories

Archive

OSQ: the next 5 languages for the web

In the evening at the OSQ retreat, we had some informal discussions about new scripting language designs and languages for mobile devices. The starting points for these discussions were (a) that scripting language programmers will soon want to use parallelism, and (b) that mobile devices will have uniprocessor performance at least 12x slower than laptops for the foreseeable future.

New JS VM techniques (like TraceMonkey’s) are getting us a 2-4x speedup compared to what we’ve been used to for a language like JS. Is that good enough for mobile? I don’t know. Bill’s results (in my last post) suggest that one way or another, we can get a further 2-3x, which might bring us to a 4-12x speedup compared to the old JS stuff. That might be good enough, or maybe not.

The next question in my mind is, if we want to someday use some new language to get a bigger speedup, what should it look like? The first thing is that it should have a reasonably good translator to JS, so that it can run on any browser, even if the fancy new language is not supported. After that, I think two interesting starting points are Ocaml and Lua.

Ocaml is a high-level typed language with lists, garbage collection, and a lot of other features to make life easy for programmers. The biggest problem is the types: many programmers seem to prefer untyped languages like Python and JS, and types make prototyping more difficult. The types are also a great advantage: they really help with reliability and performance: in shootout-type benchmarks, Ocaml achieves performance very close to C.

Lua is a scripting language in the vein of Python and JS with simple, well-documented semantics. This simplicity makes it possible to deploy a tiny Lua interpreter. It also helps with performance: Lua has a small, general set of hooks for metaprogramming, while Python and SpiderMonkey have several such mechanisms, which has a cost in complexity and performance. Also, Lua doesn’t include complex features like inheritance in the base language, which presumably makes it a lot easier to optimize property accesses.

A related question is how to expose parallelism. I tend to favor mechanisms that don’t share mutable state, such as DOM worker threads or map-reduce. One reason is that in scripting languages, property accesses are not at all atomic (they potentially require multiple hashtable lookups) so every property access has to be locked. Even with a lot of clever implementation, the locks are still pretty costly, and even for programs that are not actually concurrent: we think on the order of 10-30% in SpiderMonkey. (And Python TBMK still has the global interpreter lock, and the official recommendation is to use multiple processes as a better way of getting real concurrency.)

Parallelism is easier in functional languages like OCaml, because they don’t mutate state very much–mostly they just create new values and read them. Also, in a typed language, a property access is much more like a simple memory read, which is easier to make atomic, or atomic enough.

So, I would like to see at least 2 new web languages, which could be JavaScript dialects, evolutions of other existing languages, or something entirely new. The first, which might be called MiniJS, would be an untyped scripting language, which would be used for most applications. MiniJS would look like JS with simplified semantics (and no ‘with’ statement for sure) and support for concurrent programming. The other language, which could be called TypedJS, would be a typed functional language inspired by OCaml, Scala, and perhaps even the dreaded ES4. TypedJS would be used for applications with stringent performance and reliability requirements. The two languages should be able to communicate but I don’t think they need to be mixed freely: in fact, making them part of the same language would probably make it too hard for each language to realize its full potential.

Comments

Comment from Zack
Time: May 15, 2009, 10:43 am

I think it’s really important to have *syntax* similar to Javascript for both these hypothetical languages. That way, programmers familiar with JS are more likely to think “oh, yeah, I can do this” rather than “augh, this is unrecognizable”.

– Personal supporting anecdote: I find every language in the ML family completely incomprehensible, not because of the semantics, but because of the near-total absence of grouping punctuation. You get chains of function applications written as “foo bar baz quux blurf snort glorple” and it takes extra mental effort to remember which things are called on what. It doesn’t help that the conversion of functions of more than one argument to a chain of curried functions is exposed in the syntax – I understand why that’s done but it should be hidden away where humans don’t have to think about it.

Comment from Zack
Time: May 15, 2009, 10:45 am

Additionally: JS and Lua both badly need richer standard libraries. (Lua is much worse than JS on this score, but JS is still bad, especially if you try to take it out of the browser.) I’d like us to be thinking about these languages as usable for general programming rather than just on-web programming, and that means Python-caliber standard libraries.

Comment from voracity
Time: May 15, 2009, 11:35 pm

Sign me up for Mini-JS :)

Also, ‘with’ seems to widely acknowledged as bad in some way, but what are all the problems with it? Is it just performance? Or something else?

Comment from Brian P.
Time: May 18, 2009, 5:43 am

“The biggest problem is the types: many programmers seem to prefer untyped languages like Python and JS”

Count me in the other group. I much prefer typed languages. I so wish there was a typed language on the web. While your at it how about real object oriented. I’d like to run Java/C# without an external VM. I guess google web toolkit may be the best thing for now. But I have not used it.

Comment from dmandelin
Time: May 18, 2009, 10:35 am

@Zack: functional languages do have a history of being light on syntax compared to most programmer preferences. And your point on keeping familiar syntax is well-taken, although it does make me sad that we are apparently stuck with C syntax until the end of time. Libraries are very important to language adoption but they seem to be created by communities and they need a good deployment model. It seems hard to deploy anything other than pure JS in the browser.

Comment from Jan de M.
Time: May 18, 2009, 11:44 pm

IMHO something like ActionScript would be useful. It’s much like JS but adds typing, classes, namespaces, etc.

Comment from Leo
Time: May 19, 2009, 5:08 pm

I’ve been in a bind as to what the ‘right’ stack would be. Brad (the Google Native Client guy at the retreat) advocates exposing a sandboxed C, complete with threads, and building anything we need on top. Another option is figuring out a multicore VM instruction set and doing the same.

If you go with JavaScript, the question still remains. Building blocks or end-user abstractions? E.g., I’ve found Cilk-style task parallelism to be the cleanest general model (as opposed to, say, data parallelism, which isn’t general), but you won’t get that on top of a typical implementation of worker threads.

A further question.. what about correctness? Workers are appealing because they’re isolated, though I suspect something like Zach’s sharing qualifiers work would be good too. Workers would be indepndent and need no qualifiers, and fancier (finer sharing) would use it. Or.. you can punt on that too :)

Btw, I assume for the mixed language bit you assume calls can go between the two, a la gradual or soft typing.

[for 'with' being bad: it introduces a dynamic scope, making analyses for optimization or security hard, and, arguably, confuses programmers for similar reasons. kind of a vague answer.]

Comment from dmandelin
Time: May 26, 2009, 10:33 am

Leo: Google Native Client is a cool project, but to me it can’t be a total solution because IMO C is not a good application development language. It could be useful for embedded video and such, but hopefully things like that will be able to migrate into the browser (like HTML 5 video).

Task parallelism seems necessary for “integer codes” (or whatever you want to call the large class of applications that don’t have much data parallelism). Non-sharing worker threads look like the most likely parallelism construct to win out on the real web. I suspect most non-systems programs can do fine without shared memory or explicit locking, instead relying on separate worker threads and maybe a transactional database for communication and storage.

With the two languages I mentioned, I definitely imagined one should be able to call the other, but not free mixing. I think that can be made safe fairly easily, but getting good error messages requires some of the techniques from gradual typing, especially with function arguments.

Comment from Leo
Time: May 27, 2009, 1:53 am

I think the point of Native Client is to let you build up whatever else you want on top, like other languages. I like that idea — as long as the DOM is still in place.

Integer codes sounds HPC to me — floating point is often too slow :) I think we use ’symbolic’ or ‘branching’ codes.

Workers make sense for the short-term because the www has a weird tendency to standardize early and it’s the path of least resistance, but shared memory is sort of inevitable. The real question is *how*. I’m not a fan of transactions for typical application code and as the defacto performance mechanism (agree that a persistence layer is important — still unclear how to really persist apps well at a framework level). Cilk++ is making cool leaps for shared memory abstractions (reducers are awesome!). I’m still not sure what it means to allow GC, HOFs, etc. in such an environment, but suspect that’s the camp to be looking at.

Comment from dmandelin
Time: May 27, 2009, 10:29 am

I like “symbolic”.

Shared memory seems yucky for dynamic languages, for the reason I gave above. I had Erlang in mind where I suggested a DB for communication between concurrent processes. Obviously that is not an ultimate-high-performance communication method but I think it would work well for many web apps and versions of standard desktop UIs and such, and databases are already popular storage for such apps.

But maybe you are talking about typed languages like Cilk. I really haven’t given that issue a lot of thought, and I don’t know enough about parallel algorithms, their implementations, and how they are used in applications to say much of anything about it.

Write a comment