Memory Layout Proposal

Hans-Dieter.Dreier@materna.de Hans-Dieter.Dreier@materna.de
Thu, 4 Mar 1999 13:20:22 +0100


--ue2DGq7mmm8jdWuKNru3olrBonUyz13i
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

Memory Layout Proposal
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Here's another proposal for memory layout to support GC & objects.

It is designed for space economy and fast access,
but there are some (hopefully not too many) restrictions on
object layout. Weak references are not supported.

This layout is considerably more complicated than my initial one, though.

1. Memory is allocated (using malloc or the OS) in large *chunks*
that are subdivided into memory allocation blocks that look like this:


struct MemoryAllocationBlock {
    unsigned short usFlags;         // flags for use by GC, see below
    unsigned short usUnusedTail;    // amount of unused space at end of blo=
ck
    MemoryAllocationBlock* pPrev;   // link to previous MemoryAllocationBlo=
ck
    MemoryAllocationBlock* pNext;   // link to next MemoryAllocationBlock
    void* payload [0];              // objects containing references are th=
e norm
};


12 bytes + payload each.

.usFlags contains flags for use by memory management and other built-in ser=
vices.
These are not specified here in detail.

MemoryAllocationBlocks (including payload) must be allocated inside a chunk
contiguously with no gaps.

The first and last MemoryAllocationBlock in each chunk have special usFlags=
 flags;
in this case .next and .prev point to the first MemoryAllocationBlock of th=
e next
chunk or the last MemoryAllocationBlock of the previous chunk, respectively=
.

The last MemoryAllocationBlock in each chunk may not contain
payload because its length cannot be calculated from the .next pointer.

.usUnusedTail may be used to reserve space at the end of the object.

All the objects in a MemoryAllocationBlock live and die together, and all u=
sually
have fixed size (but it is possible to achieve variable length for the last=
 one).
Therefore only objects that are somewhat tightly coupled will usually be pu=
t inside
the same MemoryAllocationBlock.

Maybe someone may want to dream up an appropriate class description for the=
 .payload;
I will describe it here in plain english since I don't think it can be easi=
ly done
using class layouts alone.

Payload consists of:
- Pointers that point to the start of the actual MemoryAllocationBlock
  (we will call them back pointers, short: BP)
- Intra-MemoryAllocationBlock pointers (i.e. they point to some address
  inside the actual MemoryAllocationBlock, short: LP)
- Inter-MemoryAllocationBlock pointers (they point to some address
  outside the actual MemoryAllocationBlock, short: GP)
- NULL pointers
- Unspecified binary stuff

Rules are as follows:

- Each location that may be referenced by a GP must point to the class obje=
ct
  of the referenced object and be preceded by a BP unless it is the first o=
bject
  in a MemoryAllocationBlock. Whether the preceding pointer is a BP can be
  determined by comparing the BP candidate's value against its address:
    If the value is less than the address (i.e. it points to lower memory),
    it is a BP.
    Otherwise it is the .next pointer of the MemoryAllocationBlock which al=
ways
    points behind the current MemoryAllocationBlock, hence to higher memory=
.
    This need not be true for the last MemoryAllocationBlock of a chunk, bu=
t that
    may not contain any objects (as stated above), so this case cannot happ=
en.  =


- There must be a at least one BP immediately behind each (logical) object,
  which may also serve as the object prefix of the next object.
  If there is no space left between end of object and end of MemoryAllocati=
onBlock
  minus unused space, this delimiting BP may be omitted.
- There must not be a BP inside an object.
- If there is a NULL pointer preceded by a BP,
  the remainder of the MemoryAllocationBlock is considered to contain
  unspecified binary stuff. LPs may point to that stuff, GPs must not.
  Since NULL is not a legal class object and BPs are neither LPs nor GPs,
  this combination cannot occur in objects.
- No GP or LP may point to a BP. MemoryAllocationBlocks are no legal object=
s
  and hence must not be pointed to by GPs. I feel it is better to disallow
  LPs to point to them too, though strictly speaking there is no need for t=
his.

Since an object can only be referenced from the outside world using a GP
or traversing the MemoryAllocationBlocks (which is restricted to GC and may=
be
the debugger), the BP's value can always be determined
when dealing with an object for the first time.

The following properties result:

- GC can easily find out the start of the MemoryAllocationBlock, given some=
 GP.
- GC can easily distinguish GPs, LPs, BPs and references
  by simple arithmetic comparisons.
- GC can also easily determine the end of the area it needs to scan, even i=
n
  the presence of binary stuff (such as numbers, characters or arrays there=
of)
- References inside objects (LPs) are possible, allowing efficiently folded=
 tree
  structures. Of course, methods may not misinterpret such LPs as reference=
s to
  full-blown objects; LPs cannot not be publicly accessible and cannot
  be used as operands unless they are also GPs.
- NULL pointers can be freely used, except as class object references (aka =
types).
- Binary stuff is supported and may be referenced for use by methods using =
LPs.
- Introspection can determine object boundaries and types.
- If the current object is known, the next object / the start of binary stu=
ff
  (if present) can always be found by searching for a BP. If this feature i=
s
  available at the language level, single objects cannot be easily copied a=
ny longer;
  whole MemoryAllocationBlocks (excluding unused space) must be copied inst=
ead
  if no knowledge of the internal working of the object is available to the=
 copy
  function.

Using C++ class descriptions for objects is feasible only when there *never=
* or
*always* are virtual methods around since having C++ *sometimes* insert a v=
-table
somewhere into this arrangement would certainly pose problems.

I'd prefer *never*; that should be safer, unless there is a standard for
C++ object layout (including v-tables) which can be relied on.

Does anybody know ?

I suggest to use #defined macros to access .payload and
cast pointers to actual C++ classes or structures for the objects.


Examples (payload only):

1. Integer data type, standalone version

- GP: class object of type integer
- BP
- NULL
- The integer.

2. Integer data type, intended for structured use

- GP: class object of type integer
- LP: pointer to integer
- BP
- NULL
- The integer, pointed to

Those two versions could even be distinguished at runtime, since any LP !=
=3D BP.
See the note on copying single objects, above.


3. Reference type

- GP: class object of type Reference
- GP: the reference

4. Reference to LPs contained in other objects

- GP: class object of type FeatureReference
- GP: the object being referenced
- BP
- NULL
- integer: Offset of feature inside object referenced by GP

Note that no LP can be used since LPs must not point outside
the current MemoryAllocationBlock (or they would be considered GPs).
Due to securiy reasons, I do don't think we should include such a class,
however. The example should just show that it can be done.

5. Simple Stack type containing GPs (version 1)

- GP: class object of type stack
- LP: pointer to current stack top (to avoid sequential search for first BP=
)
- GP: first entry
...
- GP: current top entry
- BP
- BP
...

Remarks: Available stack space is initialised to BPs.
If the place of a new entry does not contain a BP, there is an overflow.
When an entry is popped, it must be overwritten with a BP.
When attempting to pop and there is *LP =3D=3D LP, it is an underflow.
It is not possible to derive from this class, i.e. append instance items.
(But of course, you can use it by reference).

6. Rectangle type

- GP: class object of type rectangle
- BP
- NULL
- Integer: left
- Integer: width
- Integer: top
- Integer: height

Same remark as for integer.

7. Complex number, composed from two numbers:

- GP: class object of type complex
- GP: number (re)
- GP: number (im)
- BP
- GP, re: class object of type number
- LP: number attached to re
- BP
- GP, im: class object of type number
- LP: number attached to im
- BP
- NULL
- number attached to re: binary stuff
- number attached to im: binary stuff

This carries some overhead (32 bytes),
but references to re and im may be used independently from complex,
rather than having to access by method call.
(But see the note on copying single objects, above).


--

Regards,

Hans-Dieter Dreier
(Hans-Dieter.Dreier@materna.de)=

--ue2DGq7mmm8jdWuKNru3olrBonUyz13i
Content-type: text/plain; charset="ISO-8859-1"
Content-transfer-encoding: quoted-printable

IDENTIFIKATIONSANGABEN:
a17224a.txt IA5 DX-MAIL X.400 User Agent=

--ue2DGq7mmm8jdWuKNru3olrBonUyz13i--