Mixed stuff

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Wed, 17 Mar 1999 21:49:51 +0100



Matthew Tuck schrieb:

> Hans-Dieter.Dreier@materna.de wrote:
>
> >> Well for a start this would increase fragility, since at the language
> >> level you can treat it can be a get/set or physical field.
> > If by fragility you also mean that clients and derived classes must
> > be recompiled if they use the affected member, you're right - but
> > isn't this inevitable for most changes of the member's signature?
>
> Yes, except an implementation wouldn't be changing it's signature.  I
> guess it's true that it could, but why should there be implementation
> information in the type signature?

Well if we *define* the type signature to only contain information needed for
type matching, then by definition there wouldn't be any other information in
it. But IMO then some *additional* information may be wanted by the compiler
(or later stage) for code generation purposes. It certainly makes a difference
whether a feature is accessed via a get/set call or direct access.

> > Why should it be hard to find a class member's signature? Isn't that
> > all that is needed? Maybe I'm missing the point or we are using different
> > assumptions, so please explain.
>
> Probably the latter.  Let me illustrate:
>
> type A
>    field z
>
> impl J of A
>    field z is variable
>
> impl K of A
>    field z get
>       ...
>    field z set
>       ...
>
> anA: A
>
> If you do a "anA.z", you don't know the implementation.  Even in the
> absence of multi-impls, you could inherit from J, again causing
> confusion about what "anA.z" meant.  That's why you need the
> (implementation level) get/sets in both J and K.  Hopefully z is
> statically dispatched as much as possible and therefore inlined to be
> just like a normal variable access.

You say "you could inherit from J". Who determines that? Either it's the user,
then he has to tell the compiler, or it's the compiler itself. Anyhow, the
compiler should know.

Let's say there are two sorts of information:
1. About interface - what you call type signature. It must be the same for all
possible implementations.
The compiler checks it to verify that the program conforma to the specs.
2. About implementation - if you don't have a better name, we could name that
impl signature. It may be different for different implementations. It is used
to control code generation.

> >> I'm assuming by "no link time" you refer to at run time.  That being
> >> said, everything at run time is dynamic in Java.  I was referring to
> >> compile time linking - sorry if I was vague.  The important thing is you
> >> don't need to recompile the class.  Offset information is determined at
> >> link time, also can MI instance layout information.
> > Do you mean to say that members' offsets are *set* by the linker?
>
> Yes, that's essentially what Java does.
>
> > Interesting idea, hmmm... Certainly this involves a lot of fixups
> > at link time, but it *could* be done if an executable is constructed
> > the usual way.
>
> That depends what you meant by usual way ... the usual way as I know it
> is machine code generation before linking, hence prohibiting this.

Not at all. It's just that the linker needs to do lots of fix-ups. Everywhere a
member offset is needed. Unless you want to exchange variable accessed with
function calls; then it's a completely different affair.

>  > How would this mix with incremental linking or replacement of modules?
> > At least all the fixup information would have to be retained as long as
> > further incremental linking should be possible.
>
> > And wouldn't we have to write our own linker?
>
> The old system might be something like this:
>
> Edit -> Scan -> Parse -> Check -> Intermed. Code Gen. -> Optimise ->
> Code Gen -> Link
>
> I would like something like this:
>
> Edit -> Check -> Optimise -> Link -> Optimise -> Code Gen
>
> Code gen is after linking, and both it and the second (program wide)
> optimise stage have access to the whole program.  The last two would
> increase fragility, and it goes without saying both optimisations could
> be left out although you probably would not choose the second without
> the first (load too much in memory at once).

Oh I see. That's an interesting approach. But how does this mix with
incremental compilation? I'm asking because of the global optimisation. Or is
it primarily intended for production quality code?And why is it better than
compiling the whole stuff into an executable and omitting the link/code gen
phase?

> In this process we're pretty much dealing with ASTs from the output of
> Edit to the input of Code Gen.  We'd probably lose useless stuff like
> comments and null statements along the way.  Probably not identifiers
> though since they're reflective information and probably could be better
> removed by the optimiser when not used for runtime reflection or by a
> debugger (hopefully using existing elimination optimisations).

For a start I would be quite comfortable with a simpleEdit -> Check -> Code Gen

> > Not really. But I'd leave eliminition optimisations to GC, if whole
> > object are to be eliminated. At least for a start. GC is already there,
> > and it is performing at runtime, so the compiler need not know about
> > dynamic behaviour of the program. I wouldn't mind if the compiled objects
> > contain *a little* unnecessary stuff as long as it has no real impact
> > on execution speed or memory consumption.
>
> Yes, but how do you handle the problem of dynamic dispatch?  GC
> generally are written for finding out the pointers easily.  We would
> certainly have to do some extension of parameterisation of code to do
> this as far as I can see, although it looks like could be done.  It
> might be very difficult with incremental collectors though - if you're
> interested in taking this any further maybe the GC list might be the
> place to ask.

I can't see how dynamic dispatch could be a problem for GC. Dynamic dispatch
uses pointers too, doesn't it? And they have to be reachable, otherwise
dispatch couldn't use them. Since we construct our memory layout in a way that
allows GC to identify every reference, they are subject to GC. There is no
scenario I can imagine right now where an object could be collected and
shouldn't. If it can't be reached, it is indistinguishable from not being there
at all. (It's the same as in physics: If you postulate an effect that can't be
observed, then it's the same as if there was no effect).

> One problem would be ensuring the GC was completed before the code was
> dumped.  Actually just doing a persistence dump might be enough here,
> since if an element is unreachable you won't dump it.

Or it would be collected after some time after being re-loaded. As long as the
amount of dead space is insignificant, there's no reason to worry about.

> > I found that *source code* is the most compact form if unneccessary
> > layout & comments are stripped (automatically). IMO, no AST can be that
> > small. But nowadays memory size isn't that important anymore.
>
> Depends what you call an AST.  In memory with pointers, then maybe, but
> in a file with structure being dictated by order (which I've come to
> realise is probably pretty similar to your stack machine representation)
> it would be smaller.  For a start identifiers can be reduced in size,
> and syntactic structures that use several reserved words can be replaced
> by a one or two byte node.  I think you'll find Juice does something
> like this, except I think they compress the tree somehow too.

Well if I write K=5; that's just 4 bytes for the whole phrase. How can you beat
that?But anyhow, there are more important issues right now.

> Size will always be important.

... to a degree.

>  We're always thrashing out available
> hardware, and more compact representations will always be faster.

Not necessarily so. Not if they need a lot of fixing up and housekeeping just
to keep them small.But in general you are right.

> > Of course. I really can't understand those (C++ programmers) who
> > insist on doing memory management by hand. Maybe they don't know
> > better. Well, sometimes the environment makes GC hard...
>
> I've read that the creator of C++ has said that he didn't require GC in
> C++ since it wasn't feasible at the time.  Of course now, the
> situation's different.
>
> The assumption is that GC is slow, but the fact is that that's not
> necessarily much and the time it saves you will usually let you spend
> more time on algorithm changes if necessary, which is where the real
> efficiency changes come.  Plus the reduction in bugs.

I read in an article about Eiffel that they estimate GC overhead at 10 %.I'd
gladly spend that for a 20 % reduction in development time. Noone will notice
the speed difference.

> I can say from my experience that efficiency tuning is the exception
> rather than the norm.  And that's what manual deallocation is.  Life's
> too short.

Yep.


Regards,

Hans-Dieter Dreier