On object layout and inheritance

Matthew Tuck matty@box.net.au
Sat, 05 Dec 1998 00:39:27 +1030


Ursula Dreier wrote:

> It is likely that there will be a huge number of objects in memory at
> any time. Therefore, object layout had to be as simple and the fixed
> object header as small as possible. I admit that I ruled out the
> option of objects physically containing other objects right from the
> start without really considering it seriously. Maybe this was a little
> bit quick, but I still see a lot of difficulties with such a nested
> approach:

Well there are a lot of difficulties with multiple objects, but it is
very important.  Heap allocation is very inefficient compared to other
forms of memory allocation.  Hence optimisations of memory layout are
important.  Things that could be done include:

(a) aggregating several unrelated objects with similar lifetimes into
one object in memory.
(b) inlining one object into another in memory rather than using a
pointer.

There are many problems:

(1) discovering that the optimisations are valid, i.e. discovering
lifetimes for (a) and complex analysis of pointer usage for (b).
(2) generally a garbage collector couldn't handle pointers to part
objects as trying to garbage collect the middle of a memory block could
have unpredictable results.  Some way of distinguishing could help but
it could slow down the GC.
(3) in an object-oriented language if two objects or classes of objects
exist that the program might refer to and one uses an inlined object and
the other doesn't there could be problems

That's just off the top of my head.  There's plenty I don't know about
these sorts of optimisations.  I know there are heuristic algorithms for
doing (b), but as far as I can tell their use in compilers is almost
nothing at this point in time.

However, when everything is an object, and you give up value assignment
semantics for basic types as I have advocated, you've got almost
everything on the heap.  Big time inefficiency, but also an orthogonal
language.

> GC and maybe other aspects of memory management have to access all
> objects regardless whether they are dead or alive, without navigating
> along the references. So objects need to be linked together in some
> form. In flat layout, this can be a linked list, which is easy to
> traverse and maintain and carries little memory overhead. In nested
> layout, it has to be a tree and is comparably complicated to traverse
> and maintain and carries double the overhead of a list.

I'm not entirely sure, but I think you're referring to multiple objects
in one memory object when you talk about nested layout.  If so, then you
would still use a linked list.  Ideally the GC would not know about any
multiple objects or knows as little as possible.

You can see this is possible - just look at the fact we survive ok with
value semantics in other languages.  Hence for many programs it is ok,
and these languages could quite happily have GC.  Hence the task is to
find the use of objects that could be optimised into the value semantics
form.

> GC is more complicated for nested layout than for flat because you
> can't simply free an object without examining its contents first.
> Growing and shrinking of objects is even more complicated. So nested
> layout really would have to have a BIG advantage somewhere else to to
> be competitive, which I simply cannot see right now.

The advantage is efficiency, and from everything I've heard, it's quite
a big difference.  I can't think of a case where you'd want to grow or
shrink the objects though.  That's a pretty rare case as far as I can
tell (dynamic arrays come to mind), and we could just prevent the
optimisation in that case if it posed a real problem.

> So I ended up with the flat layout and its consequence: No embedded
> objects, no references to parts of objects.

Generally you if you had references to parts of objects in nested
layout, they wouldn't be references in the GC sense.

> When I first thought about how to implement a sensible class system, I
> felt a dilemma:
> On one hand, MI seems to be a really nice thing: Implement different
> independent features of our classes each in its own class and combine
> them in an orthogonal way. Whow, what a great opportunity to avoid
> redundant code!
> 
> But there are problems: Each reference to an instance item (a "member
> variable" in C++ speak) would usually be compiled to acceed at some
> fixed offset in the list containing that instance's instance item
> references.
> To derive a subclass would simply mean to have the instance items of
> the subclass appended to the end of the list representing the instance
> of the baseclass. Quite simple.

> But that wouldn't be possible for MI, as you can easily see.

I've seen a couple of proposals that use bidirectional layout for MI. 
Basically they store on both sides of the pointer.  It's supposed to
solve the problem.  I never really went into it enough to understand it,
yet another thing to do ...

> To make the issue even more complicated, there is the "common > anchestor" problem built into the concept of MI:
> 
> class Person {};
> class Employed : public Person {};
> class Customer : public Person {};
> class EmployedCustomer : public Employed, Customer {};
> 
> What about the instance of Person that is contained twice in
> EmployedCustomer ?
> Usually you wouldn't want to have two independent instances of Person
> in EmployedCustomer.

In most cases you want to have only one Person part in repeated
inheritance.  I think that should be the standard semantics.  It's the
main difference between inheritance and object containment implementing
inheritance via method delegation.

Note that if you take the position I advocate, of the separation of
specification inheritance and implementation inheritance, the
specifications are merged, but anything can happen with the
implementations.  You could take one, you could perform some sort of
merging of the two methods (e.g. in CLOS I think it is you can say add
the results of two functions), or you can override and specify the
behaviour directly.

Note this is only a problem with "implicit" inheritance.  With explicit
inheritance you have to actually override all methods, although you can
use shorthands for direct inheritance.  I've only heard of explicit
inheritance being used with object-based inheritance languages rather
than the traditional class-based inheritance langauges though.

> Moreover, if you got a lot of these combinations, writing all of them
> down isn't much fun and quite unflexible to extend.

One idea I've had can at least partially handle this, that is, allowing
type lists, for example, you could declare a variable:

MyList: Collection, Ordered

Note that it has two types.  It can therefore take any type with these
two types as superclass.  Same goes for parameters.  I haven't extended
these to actual class implementations though, so it doesn't get around
the entire problem of exponential mixing as far as I can see.

Note the difference between declaring as follows:

type OrderedCollection is Collection, Ordered
type A is Collection, Ordered
MyList: OrderedCollection

In a name typing system these are different as MyList can't take A,
whereas it could in the first MyList.  In a structural typing system it
would work but I don't like structural typing at all.  Inheritance
without overriding is a way of actually mimicing Ada's derived types for
those familar with them.

> A "rank" regarding subclasses could only be derived from the ordering
> of  the baseclasses in the anchestor list of the subclass, which is
> sort of a "weak" ordering (not very obvious to the user).
> If you treat them as having "equal rights", you got problems with
> identical identifiers occuring in those base classes.

Method shadowing is a weird problem which I have a slant that I've
recently derived an opinion on, after a while of not knowing what to
think.

I dislike coincidental compilation in all forms.  By this, I mean the
chance of passing the type system because the name matches
unintentionally.  Structurally typing and subclassing, where clauses and
C++'s template mechanism of all examples of this poor language feature.

Method shadowing is potentially a slightly different form of
coincidental compilation.  The name could affect whether two methods are
merged.  Hence my every urge is to say that they are different methods
if they weren't declared at the same place (i.e. repeated inheritance).

You can resolve the clash by compulsory renaming.  I think Eiffel might
do it this way.

> I think there are alternatives to multiple inheritance which haven't
> got these problems and are clearer to use and easier to implement as
> well:
> 
> 1.
> class Person {};
> class Employed :{ Person* p};
> class Customer :{ Person* p};
> class EmployedCustomer : public Person {
> Employed* e;
> Customer* c;
> Person () {
>  e->p = (Person*) this;
>  c->p = (Person*) this;
>  }
> };
> 
> This works, but is a little bit complicated if not to say clumsy -
> actually I didn't feel quite at ease with this solution.

This is just really going down into the bowels of how you implement
inheritance.  It doesn't really help for existing systems either.  The
way you combine the two methods can be done with a normal override.  You
could an override if they clash, plus provide ways of combining methods
like I said CLOS does before.

In this solution EmployedCustomer is not a subtype of Customer so you
can't ask for a customer, you must ask for a person.  This is a real
loss to the type system, and is the sort of thing inheritance was made
for.

> A more elegant solution is to have parameterized classes:
> 
> class Person {};
> class Employed (class Person pE) {};
> class Customer (class Person pC) {};

> This is no valid C++ syntax, but maybe you see what I mean: It's very
> similar to templates, but is implemented statically rather than
> dynamically, avoiding C++ problems during template instantiation. It
> also has static type checking (the parameter of Employed and Customer
> must be a Person or derived from that), which C++ lacks (as far as I
> know).

In C++ it will complain at compile-time of the user of the template,
rather than being able to complain when you compile the template.

> Just declare a Customer Employed Person or an Employed Customer
> Person, and even an uncertainty regarding precedence of Employed vs.
> Customer methods cannot occur: Whatever comes first has precedence,
> because it's actually a Customer that you declare, which happend to
> have an Employee as an instance item. You got all the possible
> combinations for free without having to declare a class for each of
> them.

There's no subtyping here either, which is why I don't like this.

> I think this problem (and some others, too) can be overcome if we
> introduce a "modifier" for declarations, which I would like to name
> "transparent". It would be allowed for every declaration of something
> that can be regarded as a name space (especially: classes). The
> semantics would be that if a name cannot be found in the name space
> currently being looked at, the name space referenced by the
> transparent name would be included in the search, before search goes
> on in the parent name space. In effect, that transparent name would be

This seems to me to be getting back to the exact same system of rank
with multiple inheritance, except you don't have subtyping.

I don't think that combining mixins is such a problem.  If you wanted to
declare all the classes which represent the combinations of the mixins
you'd be in trouble, but otherwise you could declare just the classes
which consist of the sets of mixins you need for your program.

I also think solutions to multiple inheritance come a lot clearer when
you separate specification from implementation.  In this case, things
work differently depending on which you're talking about.  Its a very
powerful idea.

> Combine this with the possibility of having type expessions (discussed
> in "On classes in declarations"), and in my opinion you got all you
> could possibly get from MI, but with less pain.

I'm not sure how this relates, but I'll wait till you reply to my reply
on your type expressions.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
                              ***
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/