PS: Memory Layout Proposal

Hans-Dieter Dreier Ursula.Dreier@ruhr-uni-bochum.de
Mon, 08 Mar 1999 23:46:10 +0100


Postscriptum to Memory Layout Proposal
======================================

Thinking about an object loader syntax and how object loader would work,
I noticed that my proposal of 4 Mar '98 has a major drawback: It allows
only one section of binary stuff per MemoryAllocationBlock. There are
cases (e.g. hash tables) where multiple sections of binary stuff for
each MemoryAllocationBlock would be desirable.

Before I sketch how it works, please note that IMO "C pointer tricks"
are OK in this case since they are only privately used by MM and GC (and
debugger, I think) and are not needed by "the language".

struct MemoryAllocationBlock and the rules concerning its relationship
to memory chunks stays the same, but the definition of .payload changes:

Payload consists of:
- Pointers that point to the start of the actual MemoryAllocationBlock
  (we will call them back pointers, short: BP)
- Pointers that point to the start of the next MemoryAllocationBlock
  (we will call them forward pointers, short: FP)
- Intra-MemoryAllocationBlock pointers (i.e. they may point to any
address
  inside the actual MemoryAllocationBlock, short: LP)
  Since LPs must be distinct from GPs and FPs, they must not point to
the start
  or immediately past the end of the MemoryAllocationBlock.
- Inter-MemoryAllocationBlock pointers (they point to some address
  outside the actual MemoryAllocationBlock, short: GP)
- reserved values which cannot occur as the address of a real object
  (e.g. NULL pointers, but we may specify a small range of other values,

  say, 0...255, to accommodate a richer "instruction set" for VM).
- Unspecified binary stuff

Rules are as follows (The role of BP in my first proposal is now being
taken by FP):

- Each location that may be referenced by a GP must be _preceded_ by a
  "stuff specifier" (short: SS) that may be either a FP or a BP.
  It is a FP if its contents is greater than its address; otherwise it
is a BP.
    If it is a FP, then the location being referenced must point to the
class object
    of the referenced object or contain a reserved value
    (for special cases, see stack example).
    This type of object may contain only reserved values, LPs or GPs.
    (In Lisp it would be called a node).
    Since a MemoryAllocationBlock header ends with a FP,
    that can also double as a SS.
    That is convenient because most MemoryAllocationBlocks will contain
at least one
    node object.

    If it is a BP, then the location being referenced must contain the
length
    (in bytes) of the binary stuff which immediately follows.
    (In Lisp such an object would be called an atom).
    The length need not be a multiple of sizeof (void*),
    but the following SS may (and should) be aligned.

- There must be at least one BP or FP immediately behind each (logical)
object.
  If there is no space left between end of object and end of
MemoryAllocationBlock
  minus unused space, this delimiting BP may be omitted.

- MemoryAllocationBlocks are no legal objects and hence must not be
pointed
  to by GPs (or LPs). This also follows from the first rule.

The following properties result:

- GC can easily find out the start of the MemoryAllocationBlock, given
some GP.
- GC can easily distinguish GPs, LPs, BPs, FPs by simple arithmetic
comparisons.
- GC can also easily determine the areas it needs to scan.
  Binary stuff is neither scanned nor fixed up, should the object move.
- Pointers can be aligned. In this case binary stuff must be padded to
the next
  aligned address. When the object is traversed, the length of binary
stuff to be
  skipped must be adjusted accordingly.
- References inside objects (LPs) are possible, allowing efficiently
folded tree
  structures. Of course, methods may not misinterpret such LPs as
references to
  full-blown objects; LPs cannot not be publicly accessible and cannot
  be used as operands unless they are also GPs.
- If it is known that a LP cannot point inside binary stuff,
  the nature of the location being pointed to (and whether is qualifies
as a GP)
  can be made quite simple.
  Note that a LP may point to the _start_ of binary stuff
  See the stack example below details.
- Whether a LP also qualifies as a GP can be always determined (though
somewhat slow)
  by (forward) traversing the object up to the address being pointed to
by the LP.
- Binary stuff is supported and may be referenced for use by methods
using LPs or GPs.
- Introspection can determine object boundaries and types.
- If the current object is known, the next object can always be found by
searching
  for a FP (for node objects) or by address calculation (for atoms).
  If this feature is available at the language level,
  single objects cannot easily be copied any longer;
  whole MemoryAllocationBlocks (excluding unused space) must be copied
instead
  if no knowledge of the internal working of the object is available to
the copy
  function.
- MemoryAllocationBlocks are not accessible from the language level.
  (Pointer calculations are restricted to the legal range of LPs, if
allowed at all).

Examples (payload only):

1. Integer data type

- GP: class object of type integer
- BP
- 4
- 4 bytes containing the integer.


2. Reference type

- GP: class object of type Reference
- GP: the reference

3. Reference to arbitrary locations contained in other objects

- GP: class object of type FeatureReference
- GP: the object being referenced
- BP
- 4
- integer: Offset of feature inside object referenced by GP

Note that no LP can be used since LPs must not point outside
the current MemoryAllocationBlock (or they would be considered GPs).
Due to security reasons, I don't think we should include such a class,
however. The example should just shows that it can be done.

4. Simple Stack type containing GPs, LPs or binary stuff
- GP: class object of type stack
- LP: stack pointer (to current top entry)
- GP: first entry
...
- GP: current top entry
- NULL
...

Remarks:
If the stack pointer points to a location that is preceded by a BP,
then the top entry is binary stuff. The location pointed to is the
length of the binary stuff. A binary stuff object is considered to be a
single entry.

Otherwise the top element is a reference.

Available stack space is initialised to NULLs. If an entry is popped, it
must be overwritten with NULLs (since GC may inspect the object at any
time).
It is not possible to derive from this class, i.e. append instance
items.
(But of course, you can use it by reference).
No GPs into such a stack object can be permitted because the contents is
subject to being overwritten when a pop occurs. If references to stack
items (still on the stack) are needed, it must be implemented as a
(single) linked list, each contained in its own MemoryAllocationBlock.
This approach produces lots of garbage, however.

5. Rectangle type

- GP: class object of type rectangle
- BP
- 32
- Integer: left
- Integer: width
- Integer: top
- Integer: height

Same remark as for integer.

6. Complex number, composed from two numbers:

- GP: class object of type complex
- LP: number (re)
- LP: number (im)
- FP
- GP, re: class object of type number
- LP: number attached to re
- FP
- GP, im: class object of type number
- LP: number attached to im
- BP
- 2 * sizeof (number)
- number attached to re: binary stuff
- number attached to im: binary stuff

This carries some overhead (32 bytes),
but references to re and im may be used independently from complex,
rather than having to access by method call.


--

Regards,

Hans-Dieter Dreier
(Hans-Dieter.Dreier@materna.de)