From comp.lang.scheme [Re: C to Scheme]

Scott L. Burson gyro@zeta-soft.com
Sat, 31 May 1997 13:51:22 -0700 (PDT)


   From: Dave Mason <dmason@scs.Ryerson.CA>
   Date: Sat, 31 May 1997 02:14:42 -0400

   Getting serious performance, or making it so you could use fread to
   read in arrays of structs and have it do the right thing sounds very
   hard to me.  Especially to get all the things in this paragraph at
   once!  A related problem is that much real-world software doesn't
   treat pointers by the book, so if you want to support it all, you need
   C-like memory layouts.  If this is the level of compatibility you
   want, 2 years sounds a little high, but more realistic.

Yeah, I went through, as I recall, three different memory layout schemes.
That certainly cost me a bunch of time.

The 2 years included writing the manual and other productization.

   From: Jordan Henderson <jordan@Starbase.NeoSoft.COM>
   Date: Sat, 31 May 1997 09:26:16 -0500 (CDT)

   > A related problem is that much real-world software doesn't
   > treat pointers by the book, so if you want to support it all, you need
   > C-like memory layouts.  

   I remember reading in the comp.lang.c FAQ that the C compiler that
   came with some Lisp Machine (Zeta C?)

Yes, ZETA-C was my compiler.

					 was an extreme example of
   a machine that didn't have NULL pointers that were all all-bits-off
   integral type (in fact, the NULL pointer was mapped to some special
   object).  It was an entirely legal (by the book) rendering, but a 
   lot of software would have trouble with it.  

This isn't as bad as it sounds because (except in the presence of casts) C is
strongly typed.  The vast majority of C code is not finicky about the
representation of pointers; only something like a Lisp implementation in C
(e.g. GNU Emacs), which was doing its own pointer tagging, would have trouble.

All pointers were represented as pairs of an array and an index; NULL was
simply a pair of NIL and 0.  If you cast a pointer to an integer, you got a
cons of the array part and the index part.  You could later cast this back to
a pointer without loss of information, but obviously you couldn't do
arithmetic on it while it was in the form of a cons.

   Maybe there's two ways to go here.  One way would be to not support
   non-conforming programs (not generate the intended result, because
   the intention was backed by illegal practice).  Or attempt to support
   some large subset of illegal practices.  Current C compilers often
   support some subset of illegal practices and highly portable software
   has to eliminate all of these subsets in order to be portable so maybe
   highly portable software would work.

ZETA-C attempted (fairly successfully, I think) to find the right compromise
between performance and generality.  If you looked real closely, there were
lots of little corners of C semantics where ZETA-C was not correct.  In
practice, however, one very rarely tripped over any of these.

For instance, I used Lisp integers for C `int' and `long'.  This meant bignums
would be created automatically, as usual in Lisp.  Technically this is not a
correct C implementation (even though I don't think the standard specifically
says that the length of `int' and `long' shall be finite, one can take this as
implied) but it very rarely ran into trouble.  The only such case I remember,
which was rather amusing, was a program that did something like

  int i;
  for (i = 1; i; i <<= 1) ...

(shifting a 1 bit left repeatedly, expecting it to fall off the left end of
the word).

I should add that ZETA-C made critical use of the Lisp Machine's support for
displaced arrays of different element sizes: one can displace a byte array
onto a halfword or word array.  (Common Lisp doesn't support this, and I don't
think Scheme has displaced arrays at all.)  This permitted storage of
different sizes of things in a single aggregate, and also, e.g., writing a
word through a pointer, casting the pointer to a byte pointer, and reading the
word back as bytes, which is something C programs do occasionally.

-- Scott