<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Tamarin Tracing Internals, Part I</title>
	<atom:link href="http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/</link>
	<description>Just another Blog.mozilla.com weblog</description>
	<lastBuildDate>Fri, 20 Nov 2009 22:40:42 -0800</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Zsolt Világos</title>
		<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/comment-page-1/#comment-615</link>
		<dc:creator>Zsolt Világos</dc:creator>
		<pubDate>Tue, 24 Jun 2008 15:06:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=13#comment-615</guid>
		<description>Sorry I mistyped twice: I __couldn&#039;t__ find the mentioned ones.</description>
		<content:encoded><![CDATA[<p>Sorry I mistyped twice: I __couldn&#8217;t__ find the mentioned ones.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zsolt Világos</title>
		<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/comment-page-1/#comment-612</link>
		<dc:creator>Zsolt Világos</dc:creator>
		<pubDate>Mon, 23 Jun 2008 13:09:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=13#comment-612</guid>
		<description>I am qurious about the diagrams created by Graydon Hoare but I could find them, can anyone give me a link.
I also could find Chris Double&#039;s posts.

regards,
   Zsolt</description>
		<content:encoded><![CDATA[<p>I am qurious about the diagrams created by Graydon Hoare but I could find them, can anyone give me a link.<br />
I also could find Chris Double&#8217;s posts.</p>
<p>regards,<br />
   Zsolt</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Callek</title>
		<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/comment-page-1/#comment-544</link>
		<dc:creator>Callek</dc:creator>
		<pubDate>Mon, 02 Jun 2008 05:16:55 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=13#comment-544</guid>
		<description>I&#039;m already lost and not even through this whole post... one note, seems you forgot something:

&quot;By the way, this picture gives an overview. (Picture does not exist yet.)&quot;

Perhaps it could exist now? ;-)</description>
		<content:encoded><![CDATA[<p>I&#8217;m already lost and not even through this whole post&#8230; one note, seems you forgot something:</p>
<p>&#8220;By the way, this picture gives an overview. (Picture does not exist yet.)&#8221;</p>
<p>Perhaps it could exist now? <img src='http://blog.mozilla.com/dmandelin/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Edwin Smith</title>
		<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/comment-page-1/#comment-498</link>
		<dc:creator>Edwin Smith</dc:creator>
		<pubDate>Mon, 19 May 2008 14:27:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=13#comment-498</guid>
		<description>Reading with baited breath...

* macros are used in superinstruction defs to mainly enable various kinds of code for different compilers, without stuffing all that into the forth compiler (utils/fc.py).  

* the motivation for superinstructions is that when interpreting, CISC is better because it reduces dispatch overhead and does more work per instruciton, providing more context to the C++ compiler.  All SM&#039;s bytecodes are fat -- what we&#039;d call superinstructions in TT.

* the motivation for writing fat opcodes in Forth is that we can generate IL for them easily.  we need the IL so we can both interpret and trace these fat opcodes.  Tracing C++ means tracing x86 (or whatever) and makes it hard to recover important semantics required for optimization.  (the same risk is present in forth -- if it&#039;s too low level we can boil away important semantics).  its a balancing act.

* why forth?  the IL is stack machine bytecode, forth is just a &quot;source&quot; dialect for stack machine code.  an advantage of handwriting forth is easy factorability because functions do all their side effects on the stack.  a disadvantage is obscurity ...

* using stack machine IL is somewhat arbitrary.  there is evidence that a virtual-register based IL (one-addr or two-addr encoding) would be a better fit.  -- we dont have to give up semantic richness to do this especially if treehydra is feasible.</description>
		<content:encoded><![CDATA[<p>Reading with baited breath&#8230;</p>
<p>* macros are used in superinstruction defs to mainly enable various kinds of code for different compilers, without stuffing all that into the forth compiler (utils/fc.py).  </p>
<p>* the motivation for superinstructions is that when interpreting, CISC is better because it reduces dispatch overhead and does more work per instruciton, providing more context to the C++ compiler.  All SM&#8217;s bytecodes are fat &#8212; what we&#8217;d call superinstructions in TT.</p>
<p>* the motivation for writing fat opcodes in Forth is that we can generate IL for them easily.  we need the IL so we can both interpret and trace these fat opcodes.  Tracing C++ means tracing x86 (or whatever) and makes it hard to recover important semantics required for optimization.  (the same risk is present in forth &#8212; if it&#8217;s too low level we can boil away important semantics).  its a balancing act.</p>
<p>* why forth?  the IL is stack machine bytecode, forth is just a &#8220;source&#8221; dialect for stack machine code.  an advantage of handwriting forth is easy factorability because functions do all their side effects on the stack.  a disadvantage is obscurity &#8230;</p>
<p>* using stack machine IL is somewhat arbitrary.  there is evidence that a virtual-register based IL (one-addr or two-addr encoding) would be a better fit.  &#8212; we dont have to give up semantic richness to do this especially if treehydra is feasible.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ajaxian &#187; Having a Tamarin trace a Spidermonkey</title>
		<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/comment-page-1/#comment-497</link>
		<dc:creator>Ajaxian &#187; Having a Tamarin trace a Spidermonkey</dc:creator>
		<pubDate>Mon, 19 May 2008 12:15:37 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=13#comment-497</guid>
		<description>[...] Mandelin has posted about Tracehydra, which is the idea that the traced based JIT engine that is being worked on as part of Tamarin [...]</description>
		<content:encoded><![CDATA[<p>[...] Mandelin has posted about Tracehydra, which is the idea that the traced based JIT engine that is being worked on as part of Tamarin [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steven Johnson</title>
		<link>http://blog.mozilla.com/dmandelin/2008/05/16/tamarin-tracing-internals-part-i/comment-page-1/#comment-496</link>
		<dc:creator>Steven Johnson</dc:creator>
		<pubDate>Sun, 18 May 2008 20:00:32 +0000</pubDate>
		<guid isPermaLink="false">http://blog.mozilla.com/dmandelin/?p=13#comment-496</guid>
		<description>Hi David, nice post. A few comments/corrections/clarifications:

&gt; TT people often refer to their IL as “Forth” because they based the design on Forth or something

The IL is essentially a dialect of Forth. Nothing magic about Forth, it&#039;s just a simple, small, portable language that&#039;s easy to model and pretty close to a portable assembly language.

&gt; Digging in to Tamarin. There doesn’t seem to be a lot of documentation on TT

Sadly, yes, this is the case. Efforts like this are highly welcomed. Mason Chang&#039;s blog is also a good resource: http://masonchang.blogspot.com/

&gt; Once the tracer kicks in, TT is running machine code traces (e.g. x86-64 ISA). 

Actually, we don&#039;t yet support x86-64, only x86-32, ARM, and Thumb. (x86-64 work is underway.)

&gt; Tamarin IL is yet another stack-based bytecode

Technically, it&#039;s not a &quot;bytecode&quot; because we&#039;re currently using 16-bit opcodes...

&gt; I think the weird stuff might be “superinstructions”, which are instructions that just implement a short sequence of basic instructions.
&gt; Apparently they help interpreters run faster because by reducing decode overhead, like old CISC processors.

Exactly. Our Forth compiler (fc.py) tries to create superinstructions aggressively as it makes a huge difference in interpreter speed, doubling speed in a lot of cases. The faster the interpreter can run, the less agressively we have to create traces, which translates into less memory needed for traces. We&#039;re not done yet in improving interpreter speed. There&#039;s more work to be done here, as there&#039;s no point in generating superinstructions for non-time-critical code (eg error handling), and conversely you want to be more aggressive in really hot spots... we need to add some profile-driven feedback to the Forth compiler to help it make those decisions; see https://bugzilla.mozilla.org/show_bug.cgi?id=432541 for more info.

&gt; There’s also some junk about invalidating the box type, which seems to be some kind of debugging feature. 

Bingo. Basically every Box is a tagged value, but there are times when we don&#039;t bother updating the tag because we know it won&#039;t be examined before it&#039;s set again. Early in development this was the source of lots of heisenbugs, so in Debug builds we always update the tag, but sometimes to a value which generates assertions if you attempt to examine it.

&gt; The “computed goto” at the end is an indirect jump to the case for the next instruction. This is some kind of optimization but I’ve 
&gt; never gotten a really convincing answer as to why it works, or if in fact it works, so I won’t go into it here. 

It makes a huge difference in speed, something like 20% last I checked (try building both ways and run a performance test and you&#039;ll see). Unfortunately, GCC is just about the only mainstream compiler that supports this feature (AFAIK). Why is it faster? Mainly because so many of our operations are small, so the overhead of dispatching is significant relative to the actual work being done.  The code generated for a computed goto is often simpler/faster than for switch, and also performs better with branch prediction on modern processors. 

&gt; LBRT and LBRF (local branch if true/false?). 

Actually, Long branch (32-bit offset) -- there is also a BRF (16-bit offset).

&gt;  These opcodes don’t seem to be defined in the usual vm_*_interp.h, but somehow they are made to 
&gt; branch to foplabel_TRACE_super_or_extern in VMInterp.ii. 

Actually, only in trace mode. What you have to keep in mind is that the interpreter has two distinct loops: do_interp(), which is used when simply interpreting a sequence of opcodes, and do_trace(), which is used when we decide that we are interpreting a &quot;hot&quot; section of code that might need to be jitted. do_trace() has to do considerable extra work, as it does everything that do_interp() does, PLUS build up a series of instructions to be JIT&#039;ed. The reason for the branch to foplabel_TRACE_... is that the JIT only knows how to process the primitive set of Forth operations, so superinstructions are traced as the series of primitives they are composed of. (This is considerably slower, of course, but in do_trace() the time is completely dominated by work done by the JIT so this is generally lost in the noise.)</description>
		<content:encoded><![CDATA[<p>Hi David, nice post. A few comments/corrections/clarifications:</p>
<p>&gt; TT people often refer to their IL as “Forth” because they based the design on Forth or something</p>
<p>The IL is essentially a dialect of Forth. Nothing magic about Forth, it&#8217;s just a simple, small, portable language that&#8217;s easy to model and pretty close to a portable assembly language.</p>
<p>&gt; Digging in to Tamarin. There doesn’t seem to be a lot of documentation on TT</p>
<p>Sadly, yes, this is the case. Efforts like this are highly welcomed. Mason Chang&#8217;s blog is also a good resource: <a href="http://masonchang.blogspot.com/" rel="nofollow">http://masonchang.blogspot.com/</a></p>
<p>&gt; Once the tracer kicks in, TT is running machine code traces (e.g. x86-64 ISA). </p>
<p>Actually, we don&#8217;t yet support x86-64, only x86-32, ARM, and Thumb. (x86-64 work is underway.)</p>
<p>&gt; Tamarin IL is yet another stack-based bytecode</p>
<p>Technically, it&#8217;s not a &#8220;bytecode&#8221; because we&#8217;re currently using 16-bit opcodes&#8230;</p>
<p>&gt; I think the weird stuff might be “superinstructions”, which are instructions that just implement a short sequence of basic instructions.<br />
&gt; Apparently they help interpreters run faster because by reducing decode overhead, like old CISC processors.</p>
<p>Exactly. Our Forth compiler (fc.py) tries to create superinstructions aggressively as it makes a huge difference in interpreter speed, doubling speed in a lot of cases. The faster the interpreter can run, the less agressively we have to create traces, which translates into less memory needed for traces. We&#8217;re not done yet in improving interpreter speed. There&#8217;s more work to be done here, as there&#8217;s no point in generating superinstructions for non-time-critical code (eg error handling), and conversely you want to be more aggressive in really hot spots&#8230; we need to add some profile-driven feedback to the Forth compiler to help it make those decisions; see <a href="https://bugzilla.mozilla.org/show_bug.cgi?id=432541" rel="nofollow">https://bugzilla.mozilla.org/show_bug.cgi?id=432541</a> for more info.</p>
<p>&gt; There’s also some junk about invalidating the box type, which seems to be some kind of debugging feature. </p>
<p>Bingo. Basically every Box is a tagged value, but there are times when we don&#8217;t bother updating the tag because we know it won&#8217;t be examined before it&#8217;s set again. Early in development this was the source of lots of heisenbugs, so in Debug builds we always update the tag, but sometimes to a value which generates assertions if you attempt to examine it.</p>
<p>&gt; The “computed goto” at the end is an indirect jump to the case for the next instruction. This is some kind of optimization but I’ve<br />
&gt; never gotten a really convincing answer as to why it works, or if in fact it works, so I won’t go into it here. </p>
<p>It makes a huge difference in speed, something like 20% last I checked (try building both ways and run a performance test and you&#8217;ll see). Unfortunately, GCC is just about the only mainstream compiler that supports this feature (AFAIK). Why is it faster? Mainly because so many of our operations are small, so the overhead of dispatching is significant relative to the actual work being done.  The code generated for a computed goto is often simpler/faster than for switch, and also performs better with branch prediction on modern processors. </p>
<p>&gt; LBRT and LBRF (local branch if true/false?). </p>
<p>Actually, Long branch (32-bit offset) &#8212; there is also a BRF (16-bit offset).</p>
<p>&gt;  These opcodes don’t seem to be defined in the usual vm_*_interp.h, but somehow they are made to<br />
&gt; branch to foplabel_TRACE_super_or_extern in VMInterp.ii. </p>
<p>Actually, only in trace mode. What you have to keep in mind is that the interpreter has two distinct loops: do_interp(), which is used when simply interpreting a sequence of opcodes, and do_trace(), which is used when we decide that we are interpreting a &#8220;hot&#8221; section of code that might need to be jitted. do_trace() has to do considerable extra work, as it does everything that do_interp() does, PLUS build up a series of instructions to be JIT&#8217;ed. The reason for the branch to foplabel_TRACE_&#8230; is that the JIT only knows how to process the primitive set of Forth operations, so superinstructions are traced as the series of primitives they are composed of. (This is considerably slower, of course, but in do_trace() the time is completely dominated by work done by the JIT so this is generally lost in the noise.)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
