[gclist] tail-calls in C/C++

Greg Morrisett jgm@CS.Cornell.EDU
Mon, 05 Aug 1996 17:55:02 -0400


Why hasn't someone hacked gcc to do full tail-call elimination?
With the Boehm-Weiser collector, this is really the only thing
that's keeping me from generating very good C code from ML and
Scheme.  

Some background... 

Caveats:  If you're a C/C++ hacker, this discussion may not matter
to you.  But I think it's actually an important consideration for
these languages as well as for functional languages where tail-calls
happen frequently (see below).  

I've constructed an ML-to-C compiler using TIL/ML as the front-end
and high-level optimizer.  I'm currently using the Boehm-Weiser
collector.  I'd like to stick to standard C and 
standard C calling conventions in the generated code, ruling out 
Baker's Cheney-on-the-MTA, in order to maintain as much interoperability
with legacy code (i.e., C and C++ libraries) as possible.  

I'm already turning self-tail calls into C loops.  I've also looked at 
integrating known, mutually recursive functions into a single large 
function, threaded by gotos.  The only thing left to tackle are 
tail-calls to "unknown" functions -- i.e., code pointers imported from 
another module, or pulled out of a closure.  

For many applications, this happens quite often.  In particular, any 
code written in a continuation passing style has (by definition) lots 
of tail-calls to unknown functions.  Also, for many applications with 
call-backs, event loops, etc., there are many tail-calls to unknown 
functions.  (This is true of many C/C++ applications, too!)

When you use the Boehm-Weiser collector, I've found that the 
collector spends quite a bit of time processing stack frames
for functions that are simply tail-calls.  (It's quite easy
in many Scheme/ML applications to pile up hundreds or even
thousands of stack frames if you don't do proper tail-call
elimination for unknown functions.)  So, in addition to the
client code being slowed down (lots of unecessary function 
returns or worse, for the Sparc, spilling register windows
all over the place), the collector is significantly slowed down 
simply because there is no support in any C compiler for proper 
tail-call elimination.  

I've tried many approaches, including the approach taken by
the SML2C compiler (have one big dispatch function, which
calls a function, the function calls another function by
"returning" to the dispatcher which then calls the desired
function for it, etc.)  The cost in performance for known functions
just isn't worth it -- two function calls and a return, with
all of the arguments being spilled to memory (since C doesn't
support multiple return values.)  Furthermore, this doesn't
integrate well with C/C++ libraries.  

I've also read about other people using the first-class labels
of gcc.  But this does not seem to work on all architectures --
in particular, whenever you generate position independent code,
there's often some cruft in the calling convention that's not
covered by simply goto'ing a label.  (For instance, how do you
load all of the argument registers properly?)

For the Sparc, I resorted to writing a simple assembly routine,
tail_call(), which takes a function pointer and n arguments, 
pops the current stack frame by doing a restore on the register
window, and then jumping to the function, shuffling the arguments
appropriately, and throwing away the return address.  The register
window restore takes care of re-establishing the sp, fp, and
return address, making this quite simple.  Whenever I have a
tail-call to an unknown function, I simply call this routine.

The performance of the resulting code is quite good (on par or better 
than some well-known native-code compilers), and the collector sped up 
quite a bit.  (Side note -- this is the first time I've really found
register windows to be useful :-))

Now, gcc already does self-tail-call elimination, (if you're
careful with the code that you emit) but it doesn't work for any other
kinds of tail-calls, and as I already mentioned, it's fairly trivial
for a compiler to eliminate self-tail-calls when generating the
C code, so this doesn't help much.  It seems to me that it would be 
fairly straightforward for a compiler to deal with tail-calls properly.

What I'm wonder is (a) is gcc amenable to such a change and 
(b) if so, has someone already gone to the trouble to hack this in?  
It seems like this "simple" change could have a fairly significant
impact on the use of C as a target language, now that conservative
collection technology is working so well.  

Hoping someone has some insight...  Thanks.

-Greg Morrisett
 jgm@cs.cornell.edu