Mixed stuff

Matthew Tuck matty@box.net.au
Tue, 09 Mar 1999 22:48:52 +1030


Hans-Dieter.Dreier@materna.de wrote:

>> Well for a start this would increase fragility, since at the language
>> level you can treat it can be a get/set or physical field.
> If by fragility you also mean that clients and derived classes must
> be recompiled if they use the affected member, you're right - but
> isn't this inevitable for most changes of the member's signature?

Yes, except an implementation wouldn't be changing it's signature.  I
guess it's true that it could, but why should there be implementation
information in the type signature?

>> It's also hard to know just whether it can set it out using just a
>> field.  This is because these get/sets could be the target of a dynamic
>> dispatch, which you can't tell.  This is especially true if you allow
>> multiple implementations.  So if it's part of a dynamic dispatch it must
>> be a get/set.  The compiler can find out this sort of information
>> through optimisation, but it's hard to just know.
> Why should it be hard to find a class member's signature? Isn't that
> all that is needed? Maybe I'm missing the point or we are using different
> assumptions, so please explain.

Probably the latter.  Let me illustrate:

type A
   field z

impl J of A
   field z is variable

impl K of A
   field z get
      ...
   field z set
      ...

anA: A

If you do a "anA.z", you don't know the implementation.  Even in the
absence of multi-impls, you could inherit from J, again causing
confusion about what "anA.z" meant.  That's why you need the
(implementation level) get/sets in both J and K.  Hopefully z is
statically dispatched as much as possible and therefore inlined to be
just like a normal variable access.

>> I'm assuming by "no link time" you refer to at run time.  That being
>> said, everything at run time is dynamic in Java.  I was referring to
>> compile time linking - sorry if I was vague.  The important thing is you
>> don't need to recompile the class.  Offset information is determined at
>> link time, also can MI instance layout information.
> Do you mean to say that members' offsets are *set* by the linker?

Yes, that's essentially what Java does.

> Interesting idea, hmmm... Certainly this involves a lot of fixups
> at link time, but it *could* be done if an executable is constructed
> the usual way.

That depends what you meant by usual way ... the usual way as I know it
is machine code generation before linking, hence prohibiting this.

> How would this mix with incremental linking or replacement of modules?
> At least all the fixup information would have to be retained as long as
> further incremental linking should be possible.

> And wouldn't we have to write our own linker?

The old system might be something like this:

Edit -> Scan -> Parse -> Check -> Intermed. Code Gen. -> Optimise ->
Code Gen -> Link

I would like something like this:

Edit -> Check -> Optimise -> Link -> Optimise -> Code Gen

Code gen is after linking, and both it and the second (program wide)
optimise stage have access to the whole program.  The last two would
increase fragility, and it goes without saying both optimisations could
be left out although you probably would not choose the second without
the first (load too much in memory at once).

In this process we're pretty much dealing with ASTs from the output of
Edit to the input of Code Gen.  We'd probably lose useless stuff like
comments and null statements along the way.  Probably not identifiers
though since they're reflective information and probably could be better
removed by the optimiser when not used for runtime reflection or by a
debugger (hopefully using existing elimination optimisations).
 
> Not really. But I'd leave eliminition optimisations to GC, if whole
> object are to be eliminated. At least for a start. GC is already there,
> and it is performing at runtime, so the compiler need not know about
> dynamic behaviour of the program. I wouldn't mind if the compiled objects
> contain *a little* unnecessary stuff as long as it has no real impact
> on execution speed or memory consumption.

Yes, but how do you handle the problem of dynamic dispatch?  GC
generally are written for finding out the pointers easily.  We would
certainly have to do some extension of parameterisation of code to do
this as far as I can see, although it looks like could be done.  It
might be very difficult with incremental collectors though - if you're
interested in taking this any further maybe the GC list might be the
place to ask.

One problem would be ensuring the GC was completed before the code was
dumped.  Actually just doing a persistence dump might be enough here,
since if an element is unreachable you won't dump it.

> I found that *source code* is the most compact form if unneccessary
> layout & comments are stripped (automatically). IMO, no AST can be that
> small. But nowadays memory size isn't that important anymore.

Depends what you call an AST.  In memory with pointers, then maybe, but
in a file with structure being dictated by order (which I've come to
realise is probably pretty similar to your stack machine representation)
it would be smaller.  For a start identifiers can be reduced in size,
and syntactic structures that use several reserved words can be replaced
by a one or two byte node.  I think you'll find Juice does something
like this, except I think they compress the tree somehow too.

Size will always be important.  We're always thrashing out available
hardware, and more compact representations will always be faster.  What
you need to do is work on what will give you the most savings, and with
the CPU/disk gap getting bigger all the time, reading from disk is
becoming the bottleneck.

ASTs can probably use the fairly pointerless representation in memory to
save space when they don't need to be edited.

> Of course. I really can't understand those (C++ programmers) who
> insist on doing memory management by hand. Maybe they don't know
> better. Well, sometimes the environment makes GC hard...

I've read that the creator of C++ has said that he didn't require GC in
C++ since it wasn't feasible at the time.  Of course now, the
situation's different.

The assumption is that GC is slow, but the fact is that that's not
necessarily much and the time it saves you will usually let you spend
more time on algorithm changes if necessary, which is where the real
efficiency changes come.  Plus the reduction in bugs.

I can say from my experience that efficiency tuning is the exception
rather than the norm.  And that's what manual deallocation is.  Life's
too short.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/