[gclist] Sorting a linked collection.

Charles Fiterman cef@geode.geodesic.com
Tue, 03 Dec 1996 13:49:54 +0000


>Does the list have to remain as a single doubly-linked list during the
>sort?  (I.e., what's the nature of this "little increments" notion.)

The nature of the little increments notion is that the user calls a
function called gcMinWork() which gives an atom of time to the garbage
collector. The collector does something and returns. If the user is 
idle he can call gcMinWork() again. An atom of collector time can be 
used to run a finalizer, free an item, do an increment of sorting, 
scan a single object for pointers and move it from one collection to 
another etc. I was wrong to say doubly linked list. That is what I've 
been using because I can add and delete from it randomly very fast. 
But any collection with that property will do.

Between atoms of time the garbage collector invariants must be
protected. One of these invariants is that the collection being sorted
can have elements removed from it. I can break the invariants during the 
atom but must restore them when I'm done.

I have the constraint that I can't take unbounded time on an atom of
collection. That is I can't insertion sort because if I get a 1,000,000
element list the world might crash while I'm searching for a spot to
insert at. And if I stop the scan and keep a bookmark I hit another 
constraint that the item I'm bookmarked on may get put in another collection.

Because of the atom of time problem I think I'm stuck with heapsort.
An atom of time can insert an element into the heap which must take
less than log2N operations. The element removals maintain heap invariants
and also take less than log2N operations. This requires three pointers
per element rather than two.

>What are the constraints on the list on its elements? You seem to be
>saying that it's mainly sorted to begin with, and has many repeated
>elements.

These constraints are only probable. I can imagine situations where
they are very wrong. The objects being sorted are objects with
finalizers and most of them have no needed order so will be easy to
sort. There probably wont be many of them but I could be very wrong.
 
>If these assumptions are certain to be correct, then bubble
>sort might not be bad, though adding some extra code to move whole
>blocks of equal-keyed elements together might make sense - after all,
>you can unlink and relink a subchain as fast as you can unlink an relink
>a single element!

But finding the limits of the subchain looks like a problem given the
incremental nature of the beast and the fact that elements may vanish
between increments.


Charles Fiterman	Geodesic Systems
Phone 312-728-7196	4745 N. Ravenswood Suite 111
Fax   312-728-6096	Chicago Il 60015
cef@geodesic.com

A computer language without garbage collection
  is like a city without garbage collection.