[gclist] tail-calls in C/C++

David Chase chase@centerline.com
Tue, 06 Aug 96 11:10:29 -0400


> From: Greg Morrisett <jgm@CS.Cornell.EDU>
> Subject: [gclist] tail-calls in C/C++

> Why hasn't someone hacked gcc to do full tail-call elimination?
> With the Boehm-Weiser collector, this is really the only thing
> that's keeping me from generating very good C code from ML and
> Scheme.  

> For the Sparc, I resorted to writing a simple assembly routine,
> tail_call(), which takes a function pointer and n arguments, 
> pops the current stack frame by doing a restore on the register
> window, and then jumping to the function, shuffling the arguments
> appropriately, and throwing away the return address.  The register
> window restore takes care of re-establishing the sp, fp, and
> return address, making this quite simple.  Whenever I have a
> tail-call to an unknown function, I simply call this routine.
> 
> The performance of the resulting code is quite good (on par or better 
> than some well-known native-code compilers), and the collector sped up 
> quite a bit.  (Side note -- this is the first time I've really found
> register windows to be useful :-))

> What I'm wonder is (a) is gcc amenable to such a change and 
> (b) if so, has someone already gone to the trouble to hack this in?  
> It seems like this "simple" change could have a fairly significant
> impact on the use of C as a target language, now that conservative
> collection technology is working so well.  

I'm not certain that it is always possible to do the tail-call
transformation.  Someone else provided an example of how caller-pops
can mess things up.  A variant of this occurs on all RISC machines 
that I know of when more parameters are passed (in the tail-call) 
than will fit in registers.  The additional parameters must be stored
in memory, typically in the caller's stack frame.  The only place 
that the tail-caller can use for allocation is its own stack frame, 
and not its callers.  (F calls G tail-calls H, G cannot allocate 
memory in F's frame, so G must not discard G's frame before calling 
H).

Even when it is possible, certain things that you can legally write
in C *should* inhibit tail call elimination.  For instance, the
following tail call cannot be eliminated.

  foo() {
     int x;
     bar(&x);
     baz();
  }

Here's why this cannot be eliminated:

int * gpx;
void bar(int * px) {gpx = px;}
void baz() { printf("*gpx = %d\n", *gpx);}

Another complication arises in C++, if you are using exception handling 
and have any objects in the local frame that need destructors run 
upon frame exit (or throwing of exception).  In such a subroutine, 
there are no tail-calls (without extremely thorough analysis of the 
content of the destructors), even though some calls may syntactically 
appear to be tail-calls.

speaking for myself,

David Chase