threads (was Re: [gclist] Re: gclist-digest V3 #84)

David Chase chase@naturalbridge.com
Mon, 26 Jun 2000 15:32:50 -0400


At 10:42 PM 6/25/00 -0700, Scott Meyer wrote:
>The Point to which I was perhaps too obliquely alluding is that
>programmers have (repeatedly) "invented" process-like things and,
>in point of fact, use them to accomplish a very large proportion
>of the "real" (ie. non-programmers actually pay money to have it
>occur) computation that takes place in the world.  This pattern
>suggests that there is some value to this notion of process that
>makes the "extra weight" worth carrying.

>For example, if you have to handle input on 100 sockets is it
>more heavyweight to fork 100 threads and let them block in read,
>or to have one process blocked in poll?  Like most things IRL,
>not quite so clear-cut.

Why not have one thread blocked in poll?

I think that the difference between threads and processes is
less clear-cut in languages that are "safe".  One use of a
process is to partition both pointer smashes and memory leaks;
a process cannot smash pointers outside its own address space,
and when it exits, its resources are (usually, on competent
operating systems) reclaimed in total.  Language-enforced
safety, in terms of access-checking, protection from pointer-smashes,
and resource reclamation, appears to generally be cheaper than
using a process to do the same thing -- to look at the "whole
application" (and not just garbage collection), consider the
experience of the guys at U Washington, who apparently win enough
in low-cost safe access to OS data structures to pay for the costs
of garbage collection.

We don't have that safety extended all to cover issues of
synchronization, unfortunately, and anything that could be
checked and enforced mechanically would necessarily be
less expressive than how people sometimes use synchronization.

For example, an "object" might be declared to be "synchronized
by" some other object at some simple fixed relationship
(as in "synchronized by this"), or it can be declared as
"reached from synchronized" (to capture the notion of a tree
or other linked data structure that is locked at its root),
with the assumption that these things are checked and enforced
by a compiler (or "verifier").

I think a corollary of this is that *only* "synchronized"
data gets written to global variables, which seems pretty
onerous, but also necessary in any really large multithreaded
program (as in, what the heck were you planning to do with
that global data if it wasn't synchronized by something?)
I've been playing with the idea for some time, but it's not
like I have the time or funding to do anything with it, so
it hasn't really gone anywhere, nor is it clear that anyone
would want to use the result if I did.

Another place where language-level safety is currently
inadequate is resource consumption -- if a process gets
"too large" that is easily detected and the process can
be killed (perhaps leaving file data in an inconsistent
state, too bad about that) -- but there's no easy way
to detect which thread is hogging all the memory, nor is
it necessarily easy to kill the thread (uncooperative
Java threads are statistically unkillable).  Again, I
think this would all be interesting stuff to look at in
language research.

>Unfortunately, (for thread-partisans) the nominally
>cheaper threaded environment comes at the cost of a heavy
>burden of heretofor "system-level" mutual-exclusion issues.
>The naive approach to resolving these issues, making all libraries
>thread-safe, has a disastrous synchronization overhead and
>is very difficult to debug.

This is very much a matter of "it depends" and underlying
hardware implementation.  Given compare-and-swap or
load-linked/store-conditional, it is relatively easy to
implement "locks" that (in the uncontended, common case)
cost one CAS going in and one CAS going out, and not
too hard (as in, we did it) to even support Java's multiple-
acquire locks at low cost (one CAS in, one CAS out, zero
additional CAS's and not much logic for recursive
acquire/release).

The cost of the operation then depends upon how well the
compiler and hardware work together to hide it.  On Pentium,
at least, it's not that cheap, because the CAS has to be
bus-locked on an actual multiprocessor, which means something
like 12 (?) memory bus cycles.  But it could be cheaper.
LL/SC, as I understand it, is pretty cheap on small-to-medium
scale (snooping-cache) multiprocessors, and CAS *could* be
implemented "in the network" of combining memory networks
(I don't know if any machines were built that used these,
but I read about them somewhere); hiding the latency would
require that the compiler schedule it well, but compilers
can often do that.



David Chase                    --  chase@naturalbridge.com
NaturalBridge Inc              --  http://www.naturalbridge.com
BulletTrain bytecode compiler  --  when you can't wait for performance