On object layout and inheritance

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Sun, 06 Dec 1998 00:33:07 +0100


Dies ist eine mehrteilige Nachricht im MIME-Format.
--------------26683B8814CE44C2AE7638D6
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



Matthew Tuck schrieb:

> Well there are a lot of difficulties with multiple objects, but it is
> very important.  Heap allocation is very inefficient compared to other
> forms of memory allocation.

Are you sure? I think that's a case for optimisation. Maybe a stacklike regime is faster, but it is of limited
usefulness, or at least more complicated: You'd have to take special measures to prevent a stack object from being
deleted if there is a reference to it still surviving.

> Hence optimisations of memory layout are
> important.  Things that could be done include:
>
> (a) aggregating several unrelated objects with similar lifetimes into
> one object in memory.

> (b) inlining one object into another in memory rather than using a
> pointer.

You would just save one (1) pointer and have to service two possible implementations everywhere where a reference is
used (which is about every single line of code). Is that really worth the trouble?

> There are many problems:
>
> (1) discovering that the optimisations are valid, i.e. discovering
> lifetimes for (a) and complex analysis of pointer usage for (b).
> (2) generally a garbage collector couldn't handle pointers to part
> objects as trying to garbage collect the middle of a memory block could
> have unpredictable results.  Some way of distinguishing could help but
> it could slow down the GC.

> (3) in an object-oriented language if two objects or classes of objects exist that the program might refer to and
> one uses an inlined object and
> the other doesn't there could be problems

(4) the inlined object would have to have a fixed length, known in advance, if not located at the end of the outer
object.

But the real point is: Which benefits are to be gained by implementing nested objects?
I can't see any. Maybe you can show me some. Otherwise, why mess it up, if it's not necessary?

> That's just off the top of my head.  There's plenty I don't know about
> these sorts of optimisations.

I view it as a means of simplification not optimisation. Generally speaking, memory management and objects (in the OOP
sense) need not be implemented by the same device. I'll admit that. But it's just that IMHO I see so many benefits if
you use the object header for memory management as well, that this seems to me to be the natural approach.

Maybe we should get our terminology clearly explained:
When I mean storage allocation (storage obtained from the OS), I'll name it a memory block, when I mean an object in
the sense of OOP, I'll name it an object.

I think a memory block should host many objects tightly packed with no gaps, the object being the unit of our own
suballocation scheme as well. Pointers only point to the beginning of an object; there are no objects within objects.
Because we handle suballocation ourselves rather than bother the OS every time, it will be pretty fast. GC will be
simple. The price to be paid is memory for headers, or course. But I think most of the info they contain serves a dual
purpose: memory management and use by the class. So some of the header overhead will eventually be implemented anyway.
All we have to do is try to pack as much into a single object as possible to keep the ratio of headers to contents at
bay. For example, a whole stack frame or a whole compiled function body.

> However, when everything is an object, and you give up value assignment
> semantics for basic types as I have advocated, you've got almost
> everything on the heap.

Beg your pardon, but I think you need to explain "assignment semantics".

>  Big time inefficiency, but also an orthogonal
> language.

> > GC and maybe other aspects of memory management have to access all
> > objects regardless whether they are dead or alive, without navigating
> > along the references. So objects need to be linked together in some
> > form. In flat layout, this can be a linked list, which is easy to
> > traverse and maintain and carries little memory overhead. In nested
> > layout, it has to be a tree and is comparably complicated to traverse
> > and maintain and carries double the overhead of a list.
>
> I'm not entirely sure, but I think you're referring to multiple objects
> in one memory object when you talk about nested layout.

I meant multiple objects within an object. I never thought of treating memory blocks (as obtained from the OS) as
objects in their own right.

>  If so, then you would still use a linked list.  Ideally the GC would not know about any
> multiple objects or knows as little as possible.

We could in fact use a linked list. But GC certainly would have to know about nested objects. Assume the outermost
object would be GCed while some contained object still is alive. Without checking for nested objects, the whole memory
block would turn to garbage. Figuring out the actual memory layout would be easier for GC if it were a tree: Tree
layout could reflect nesting. But I really don't think nested objects are an idea worthwile to consider.

> > GC is more complicated for nested layout than for flat because you
> > can't simply free an object without examining its contents first.
> > Growing and shrinking of objects is even more complicated. So nested
> > layout really would have to have a BIG advantage somewhere else to to
> > be competitive, which I simply cannot see right now.
>
> The advantage is efficiency, and from everything I've heard, it's quite
> a big difference.

> I can't think of a case where you'd want to grow or
> shrink the objects though.  That's a pretty rare case as far as I can
> tell (dynamic arrays come to mind), and we could just prevent the
> optimisation in that case if it posed a real problem.

How about strings? String manipulations can be more efficient if handled in-place.Maybe stacks as well (though I'll
admit that their maximum length is often known beforehand).

> Generally you if you had references to parts of objects in nested
> layout, they wouldn't be references in the GC sense.

Again, you would have to make sure such objects couldn't be GCed accidentally.If GC were able to move objects (say, to
compactify before saving a whole image on disk for a fast reload), those references would have to be fixed up anyway.

> > When I first thought about how to implement a sensible class system, I
> > felt a dilemma:
> > On one hand, MI seems to be a really nice thing: Implement different
> > independent features of our classes each in its own class and combine
> > them in an orthogonal way. Whow, what a great opportunity to avoid
> > redundant code!
> >
> > But there are problems: Each reference to an instance item (a "member
> > variable" in C++ speak) would usually be compiled to acceed at some
> > fixed offset in the list containing that instance's instance item
> > references.
> > To derive a subclass would simply mean to have the instance items of
> > the subclass appended to the end of the list representing the instance
> > of the baseclass. Quite simple.
>
> > But that wouldn't be possible for MI, as you can easily see.
>
> I've seen a couple of proposals that use bidirectional layout for MI.
> Basically they store on both sides of the pointer.  It's supposed to
> solve the problem.  I never really went into it enough to understand it,
> yet another thing to do ...

Do you have a refence where I can read something about that? At first sight I can't see how that could solve the
problem, given that you just have *two* ends but *multiple* inheritance.

> > To make the issue even more complicated, there is the "common > anchestor" problem built into the concept of MI:
> >
> > class Person {};
> > class Employed : public Person {};
> > class Customer : public Person {};
> > class EmployedCustomer : public Employed, Customer {};
> >
> > What about the instance of Person that is contained twice in
> > EmployedCustomer ?
> > Usually you wouldn't want to have two independent instances of Person
> > in EmployedCustomer.
>
> In most cases you want to have only one Person part in repeated
> inheritance.  I think that should be the standard semantics.

Sure. Otherwise there wouldn't be a problem.

> Note that if you take the position I advocate, of the separation of
> specification inheritance and implementation inheritance, the
> specifications are merged, but anything can happen with the
> implementations.  You could take one, you could perform some sort of
> merging of the two methods (e.g. in CLOS I think it is you can say add
> the results of two functions), or you can override and specify the
> behaviour directly.
>

You mean you implement parts of a merged specification in one and other parts in another class?How is that merging of
specifications supposed to work?
(I can't help it, but "merging" something always makes me nervous - there are so many examples where merging things
adds complexity - just look at constructors & destructors in C++).

> Note this is only a problem with "implicit" inheritance.  With explicit
> inheritance you have to actually override all methods, although you can
> use shorthands for direct inheritance.  I've only heard of explicit
> inheritance being used with object-based inheritance languages rather
> than the traditional class-based inheritance langauges though.

Sounds rather uninteresting to me.

> Note the difference between declaring as follows:
>
> type OrderedCollection is Collection, Ordered
> type A is Collection, Ordered
> MyList: OrderedCollection
>
> In a name typing system these are different as MyList can't take A,
> whereas it could in the first MyList.  In a structural typing system it
> would work but I don't like structural typing at all.

Nor do I. If I'd want such a behaviour, I wouldn't create two classes like this in the first place.

>  Inheritance without overriding is a way of actually mimicing Ada's derived types for
> those familar with them.

Resembles typedef - that works similar.

> I dislike coincidental compilation in all forms.  By this, I mean the
> chance of passing the type system because the name matches
> unintentionally.

Yeah. Errors resulting from this may be hard to catch, because the "reasons" are scattered among more than one class.

>  Structurally typing and subclassing, where clauses and
> C++'s template mechanism of all examples of this poor language feature.
>
> Method shadowing is potentially a slightly different form of
> coincidental compilation.  The name could affect whether two methods are
> merged.

See my comment on merging, above.

> > class Person {};
> > class Employed :{ Person* p};
> > class Customer :{ Person* p};
> > class EmployedCustomer : public Person {
> > Employed* e;
> > Customer* c;
> > Person () {
> >  e->p = (Person*) this;
> >  c->p = (Person*) this;
> >  }
> > };
> >
> > This works, but is a little bit complicated if not to say clumsy -
> > actually I didn't feel quite at ease with this solution.
>
> This is just really going down into the bowels of how you implement
> inheritance.  It doesn't really help for existing systems either.  The
> way you combine the two methods can be done with a normal override.  You
> could an override if they clash, plus provide ways of combining methods
> like I said CLOS does before.
>
> In this solution EmployedCustomer is not a subtype of Customer so you
> can't ask for a customer, you must ask for a person.  This is a real
> loss to the type system, and is the sort of thing inheritance was made
> for.

You would have to ask for Person.c. Then you'd get a customer. But, as I said, I wouldn't like this solution anyway.

> > A more elegant solution is to have parameterized classes:
> >
> > class Person {};
> > class Employed (class Person pE) {};
> > class Customer (class Person pC) {};
>
> > This is no valid C++ syntax, but maybe you see what I mean: It's very
> > similar to templates, but is implemented statically rather than
> > dynamically, avoiding C++ problems during template instantiation. It
> > also has static type checking (the parameter of Employed and Customer
> > must be a Person or derived from that), which C++ lacks (as far as I
> > know).
>
> In C++ it will complain at compile-time of the user of the template,
> rather than being able to complain when you compile the template.

You mean if I tried to instantiate the template with a parameter where some required member of Person were not
present, it would complain of the use of that (now unknown) member?The error message would not be very enlightening,
and I must consider this kind of "type checking" to be rather unsatisfactory.

> > Just declare a Customer Employed Person or an Employed Customer
> > Person, and even an uncertainty regarding precedence of Employed vs.
> > Customer methods cannot occur: Whatever comes first has precedence,
> > because it's actually a Customer that you declare, which happend to
> > have an Employee as an instance item. You got all the possible
> > combinations for free without having to declare a class for each of
> > them.
>
> There's no subtyping here either, which is why I don't like this.

Please explain. I don't understand what you mean.class Employed (class Person pE) {};
should mean: This class should be considered as to be derived from a class called pE when used as a type inside this
class, where pE must be a Person or derived from Person. So you can access all of Person in Employed.
Because some Employed must be derived from Person, it can used in Customer. Of course, methods from Customer cannot
directly reference items which only Employed contains (otherwise you would have to derive Customer from Employed).

If you mean subtyping Customer: Why shouldn't that be possible?
You could either declare a subtype like this:
class SpecialFixedCustomer : Customer (SomeSpecialPerson) {};
which would make it an ordinary (non-parameterized) class, or
class SpecialVariableCustomer (class Person pE1) : Customer (pE1) {};
which would make it a parameterized class, passing that parameter on to Customer.

> > I think this problem (and some others, too) can be overcome if we
> > introduce a "modifier" for declarations, which I would like to name
> > "transparent". It would be allowed for every declaration of something
> > that can be regarded as a name space (especially: classes). The
> > semantics would be that if a name cannot be found in the name space
> > currently being looked at, the name space referenced by the
> > transparent name would be included in the search, before search goes
> > on in the parent name space. In effect, that transparent name would be
>
> This seems to me to be getting back to the exact same system of rank
> with multiple inheritance, except you don't have subtyping.

But it's more obvious to the user, and because of normal inheritance rules you got an "override" relationship instead
of name clashes.

> I also think solutions to multiple inheritance come a lot clearer when
> you separate specification from implementation.  In this case, things
> work differently depending on which you're talking about.  Its a very
> powerful idea.

I feel we might have to discuss this "separate specification from implementation" issue in more detail. Up to now I
viewed a specification as something roughly equivalent to a class consisting entirely of (usually pure) virtual
functions in C++, maybe enriched by assertion and the like (if and when we choose to implement such a thing, which I
strongly advocate). If there is more to it, I'd be really like to know about it.

> > Combine this with the possibility of having type expessions (discussed
> > in "On classes in declarations"), and in my opinion you got all you
> > could possibly get from MI, but with less pain.
>
> I'm not sure how this relates, but I'll wait till you reply to my reply
> on your type expressions.

All I want to say is that in this case I stumbled upon MI as a concept found in common programming languages which I
didn't feel comfortable with, for the reasons I already gave. But I recognize its general desirability. So I looked
for other concepts which would make MI's functionality available, but without the hassle.


--------------26683B8814CE44C2AE7638D6
Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
Content-Transfer-Encoding: 7bit
Content-Description: Visitenkarte für Hans-Dieter Dreier
Content-Disposition: attachment; filename="vcard.vcf"

begin:          vcard
fn:             Hans-Dieter Dreier
n:              Dreier;Hans-Dieter
email;internet: Ursula.Dreier@ruhr-uni-bochum.de
x-mozilla-cpt:  ;0
x-mozilla-html: FALSE
version:        2.1
end:            vcard


--------------26683B8814CE44C2AE7638D6--