GC alternatives? Re: pointer/integer type distinction

Eric W. Biederman ebiederm@cse.unl.edu
Sat, 13 Apr 1996 13:42:09 -0500


   I was thinking about ways to do GC, when it occurred to me:
   why not do something that explicitly declares object dependacies, instead 
   of polling for them. Something that works analogously (something like 
   that) to the "make" utility, but with memory blocks instead of files.

This is what you get when you looking at the memory blocks (when you are
doing garbage collection).  As the dependencies are subject to change
(unless you state that all writes must go to new objects(not too
expensive really)) you would pay a high time & space overhead for a
minor optimization.

   Also, what about using a split heap, with one being contained within the 
   r-stack (something similar to alloca), and the other being a more 
   conventional heap. This would allow objects to be automatically 
   deallocated when the context they were allocated within exits, while 
   allowing long term allocations to be handled explicitly.

This sounds good for the general case of contexts entering and exiting
that you have in standard procedureal languages.  The thing to watch
out for here is that lambda functions (from lisp/scheme) like the
context they are in to be part of the function, and as these functions
have an indefinite lifetime they quickly defeat that optimization, in
specific instances.

   This would doubtless be more efficient than GC, but would it work for the 
   HLL?  According to a paper on real-time GC (in an archive of papers I 
   found on the web, pointed to by the Tunes page, but I forget where), I 
   read that stack-allocation is actually better than GC for short-term 
   objects, and that these make up the majority of allocations in a HLL.

As to the efficiency of the GC it is reportedly comparable with or
better then regular memory allocation.  This is offset by 2 things.
First a garbage collected heap is used more heavily, and second and
most noticeable is that when the garbage collector runs a noticeable
paus happens in a standard collector.  So even if a standard heap
allocation method takes 5 minutes total time in a long running editor
and the GC method takes 2 minutes for the same amount of stuff,
because the editor stops for that whole two minutes you get really
torqued with the more efficient GC algorithm.

   This is a relatively simple and fast way to do things, but could the HLL 
   cope with it? If not, is there some other alternative that would work 
   better, or is GC really the only way to go?

GC is not bad, just misunderstood.

   I'm having problems with how to separate pointers and integers. Some 
   questions:

   1. Should there be completely separate operators for integers and pointers?
   2. Should it be possible to add integers to pointers, etc.?
   3. What should happen if someone uses an integer-only operator on a pointer?
   4. What is the best way to implement this so that a minimum number of 
      instructions/cycles are needed to do all this?

   Questions 1 and 4 are closely related, since "smart" operators would be 
   expensive.

   One possible answer to question 3 would be to implement exception handling
   in the LLL. Something like the ANS Forth "THROW" might be appropriate.

   For GC to work, the answer to question 2 would seem to be "very limited". 
   But there would need to be a fairly complete set of operators to access 
   data through a pointer. Limitations here should be done very carefully.

   Opinions, anyone?

The critical part is not the operations from the perspective of the
garbage collector.  The garbage collector never sees nor cares what
instructions you use on a piece of memory (except for allocation of
course) instead what the garbage collector wants to know is:

Is _this_ piece of memory a pointer?

Coming from a forth perspective I can see how it can be difficult to
seperate out the variables on the stack by kind.  As the forth
abstraction is the integer stack.  And those integers may be pointers.

What RPL (from the HP calcs) is to only have a stack of pointers.  But
I don't think that would be what we want to do here.

My preference goes for having strongly typed functions and something
like the C++ stack for procedure call and scratch variables, that all
you have to do is look aside at some table in memory somewhere for the
function to see exactly what types of arguments it takes, and what it
uses for scratch.

But that's a little more work then is needed almost in this case if
you are implementing this with threaded functions, and a small core
only of functions in assembly.  In that case what we could do for now
would be to leave have strong information about what types of
arguments the functions accept, and simply use the heap for scratch,
or don't the the garbage collector see you using temporaries on the
call/return stack.