[gclist] A problem with the Baker treadmill.

Paul R. Wilson wilson@cs.utexas.edu
Mon, 13 Jan 1997 13:49:49 -0600


>From majordom@iecc.com Mon Jan 13 12:29:54 1997
>Date: Mon, 13 Jan 1997 11:20:15 -0700
>To: gclist@iecc.com
>From: James McCartney <james@clyde.as.utexas.edu>
>Subject: Re: [gclist] A problem with the Baker treadmill.
>
>At 11:59 AM -0600 1/13/97, Paul R. Wilson wrote:
>
>>For a system without safe points, or when supporting
>>true multiprocessing, you might want to use a snapshot
>>write barrier rather than a Dijkstra- or Steele-style
>>write barrier.
>>
>>A snapshot write barrier only has to append the overwritten
>>value to a log (list or whatever), so you can have a log
>>per thread or a log per processor.
>
>Which makes me curious because in your survey paper you mention
>the incremental update write barriers "have generally been cast
>in terms of parallel systems", and the first one you mention in
>the Dijkstra algorithm. Yet you say above it seems that they are less
>suited to multiprocessing. I am curious in what way they are better
>for parallel systems.

Actually, I'm kind of curious myself. :-)  I didn't mean to say anything
definitive, because I haven't thought hard about this in quite a while.
(That's why I said "might want to" above, and I probably should have
emphasized that more.)  I was just suggesting that if you have trouble
with one kind of write barrier, consider another one---the choice of
write barriers is pretty much independent of the overall treadmillish
fake-copying scheme.

One thing I'm not clear on is the costs of synchronization operations
on today's chips.  Last time I thought about this, I was worried
that on some chips, synch operations were expensive (e.g., no atomic
test-and-set).  (And I was under the impression that on some crummy
OS's, the only portable synch operations were system calls.  This
is less of an issue for you.)

Presumably on any reasonable modern chip, this is
less of an issue, but I'm not entirely sure.  (I know some chips
have had trouble with their cache coherency stuff, and don't know
whether there are problems with basic synching.  I also don't know
the speed issues in the atomic operations that are available.)

I think this is one of those non-obvious devil-in-the-details things.
I'ts also the kind of thing that you won't get a lot of guidance from
the early papers, which were more oriented toward proving that things
were possible and maybe theoretically efficient, rather than showing
the constant factor hardware-dependent costs.

BTW, some info on your system might be helpful.  Are you using Be
threads, and are they switched unpredictably?  How fast are the
synch instructions?  Do you have a separate GC thread, or do you
fake it by doing some GC work by an out-of-line call every n bytes
of allocation?

(I assume that you want to do real multiprocessing because the Be Box
comes with two processors standard.)