Mixed stuff

Matthew Tuck matty@box.net.au
Sat, 03 Apr 1999 21:28:24 +0930


Hans-Dieter Dreier wrote:

> Well if we *define* the type signature to only contain information needed for
> type matching, then by definition there wouldn't be any other information in
> it. But IMO then some *additional* information may be wanted by the compiler
> (or later stage) for code generation purposes. It certainly makes a difference
> whether a feature is accessed via a get/set call or direct access.

Ada does something like this, in requiring a package spec (consider as a
type) to specify it's private fields.  It is positively awful.

Anyway, I've said a lot about this in another thread.

> You say "you could inherit from J". Who determines that? Either it's the user,
> then he has to tell the compiler, or it's the compiler itself. Anyhow, the
> compiler should know.

Yes, the compiler knows, but it can be inter-module information and
hence will increase fragility if used.
 
> Let's say there are two sorts of information:
> 1. About interface - what you call type signature. It must be the same for all
> possible implementations.
> The compiler checks it to verify that the program conforma to the specs.
> 2. About implementation - if you don't have a better name, we could name that
> impl signature. It may be different for different implementations. It is used
> to control code generation.

Hmm, so the code generation info is in the impls?  What sort of
information would this be?
 
> > > Interesting idea, hmmm... Certainly this involves a lot of fixups
> > > at link time, but it *could* be done if an executable is constructed
> > > the usual way.
> > That depends what you meant by usual way ... the usual way as I know it
> > is machine code generation before linking, hence prohibiting this.
> Not at all. It's just that the linker needs to do lots of fix-ups. Everywhere a
> member offset is needed. Unless you want to exchange variable accessed with
> function calls; then it's a completely different affair.

Not really, it's just an offset into a vtable still, which still needs
to be laid out in the same way.

This sort of fixup would seem to be the only way to achieve run-time
type/impl loading.  A fixed layout could be used to achieve default
offsets.

> Oh I see. That's an interesting approach. But how does this mix with
> incremental compilation?

Well it would probably look more like this:

Edit -> Check -> Optimise -> Link -> Optimise -> Code Gen
                    ^---------/

Slowly more and more is linked together, and possibly optimised.  There
can be multiple links.  For example, a link might combine methods into
an impl, several modules together (chosen intelligently to reduce
fragility by taking into account dependences), or combine modules into a
program.

> I'm asking because of the global optimisation. Or is
> it primarily intended for production quality code?

Yes.

> And why is it better than compiling the whole stuff into
> an executable and omitting the link/code gen phase?

How could you omit the link/code gen?  Linking is required to support
multiple modules, and code gen (if not interpreting ASTs), is necessary
to convert to the resultant code.

> For a start I would be quite comfortable with a simpleEdit -> Check -> Code Gen

Yes, that's how I would start it too.  But modules would be desired
pretty quickly, and optimisation would be useful to stop it from being
as slow as a tortoise!

One issue we need to think about more is how modules would work with the
translational hierachy?  Ie At at least some stages in the hierachy
there would be linking.  That whole compile cycle or something a little
smaller would occur at each translation level.

>> Yes, but how do you handle the problem of dynamic dispatch?  GC
>> generally are written for finding out the pointers easily.  We would
>> certainly have to do some extension of parameterisation of code to do
>> this as far as I can see, although it looks like could be done.  It
>> might be very difficult with incremental collectors though - if you're
>> interested in taking this any further maybe the GC list might be the
>> place to ask.
> I can't see how dynamic dispatch could be a problem for GC. Dynamic dispatch
> uses pointers too, doesn't it? And they have to be reachable, otherwise
> dispatch couldn't use them. Since we construct our memory layout in a way that
> allows GC to identify every reference, they are subject to GC. There is no
> scenario I can imagine right now where an object could be collected and
> shouldn't. If it can't be reached, it is indistinguishable from not being there
> at all. (It's the same as in physics: If you postulate an effect that can't be
> observed, then it's the same as if there was no effect).

Hmm, are you referring to run-time or compile-time here?

> Well if I write K=5; that's just 4 bytes for the whole phrase. How can you beat
> that?But anyhow, there are more important issues right now.

OK, I wasn't really thinking of obfuscated source including shrunk
identifiers.  In this case yes, but what about the overhead of if
statements?  If ... then ... else ... elseif ... end.  That's a lot of
keywords, and they can't be obfuscated.

Yes, it's not really important, and there are other advantages of AST
storage.

> Not necessarily so. Not if they need a lot of fixing up and housekeeping just
> to keep them small.But in general you are right.

That's true, but not necessarily forever.

While I can't guarantee the trend of CPU speed increasing faster than
disk speed will continue over the next 5-20 years (and that's the time
we should be aiming for), if it does then eventually even a large amount
of housekeeping will become more efficient than accessing disk more. 
Nowadays decompressing from disk off the fly can be quicker than
accessing the uncompressing version.  And it will likely only get more
so.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/