threads (was Re: [gclist] Re: gclist-digest V3 #84)

Richard A. O'Keefe ok@atlas.otago.ac.nz
Wed, 28 Jun 2000 12:02:12 +1200 (NZST)


Scott Meyer <smeyer@us.oracle.com> wrote:
	Comparing a thread
	to a process and claiming the former to be "lightweight" is
	akin to comparing a chassis to a complete car.  Of course the
	chassis is lighter...  You need to compare complete applications.
	For example, if you have to handle input on 100 sockets is it
	more heavyweight to fork 100 threads and let them block in read,
	or to have one process blocked in poll?

For the life of me, I cannot make sense of this question.
In some Erlang implementations and some Ada implementations,
if you have 100 threads handling input from 100 sockets,
they would *BE* one (UNIX) process blocked in poll().

Meyer is conflating two different things:
 - threads as an implementation technique for languages with concurrent
   semantics, INDEPENDENT of any operating system
 - POSIX threads (or Windows threads or MVS tasks or MCP tasks or whatever)
   used in a particular way.
They may be the same (sometimes Java threads map to operating system
threads) or they may not (sometimes Java threads DON'T map to operating
system threads).  If it comes to that, operating system threads are not
a single thing:  Solaris has two layers of threads, one "mediumweight"
and the other "lightweight" multiplexed on the lower layer.  (Actually,
I don't understand Solaris threads as well as I like.  In particular,
I don't know whether POSIX threads are a third kind of thread, or a
rebranding of one of the other kinds.)

	Speaking of actual implementations of threads and processes,
	there is some difference (page tables, acls, etc.) between the
	two, such that threads appear to be somewhat cheaper than
	processes.

Here Meyer is talking about runtime costs.

	Unfortunately, (for thread-partisans) the nominally
	cheaper threaded environment comes at the cost of a heavy
	burden of heretofor "system-level" mutual-exclusion issues.

Here Meyer is talking in part about design time costs, but *those*
costs don't exist if you are programming in Erlang.  The run-time
library takes care of them for you.

	The naive approach to resolving these issues, making all libraries
	thread-safe, has a disasterous synchronization overhead and
	is very difficult to debug.

The naive approach to just about anything is disastrous.
This is one of the key arguments for well-designed libraries
and language run-time systems.  Using heavyweight processes is no
guarantee of simple or low-cost synchronisation.  System V semaphores
are not the easiest IPC mechanism to use correctly, for example.

	Most recently, this issue seems to have
	motivated the Java architects to rediscover the idea (previously
	rediscovered by the Smalltalk crowd) of leaving most libraries
	unsynchronized and providing a small set of synchronized data
	structures.  Or at least trying to.
	
Java certainly used to be a naive approach.  I have to point out to
students that just because x is an instance of a synchronised class,
that doesn't mean that x.set(x.get() + 1) can be relied on to increment x.

Synchronisation remains an issue whether the process mechanism is as
light as an electron or as heavy as a star.

	In my experience, most successful uses of threads come very
	close to that model or to the purely functional varient
	suggested earlier in this discussion.   In practice it is
	pretty hard to measure much difference between this stylized
	use of threads and processes with shared memory.  Pragmatically,
	it is a hell of a lot easier to implement the process version
	of things.
	
Speaking solely about Erlang here, pragmatically, that is not so.
And I strongly suspect that the alleged difficulty of measurement has
to do with what kinds of systems were measured:  programs designed for
a language (Erlang) which implements threads cheaply and regards the
existence of hundreds or thousands of threads within a single program
as normal will have different behaviour from programs designed for a
language (C, say) where thread management is difficult and thread
creation rare.  Note the POSIX.1c limit
	#define _POSIX_THREAD_THREADS_MAX 64
Nobody designing with that limit in mind is ever going to think of a
program architecture that involves a thousand threads.

The separate-UNIX-process approach has one great merit:  different
processes can run with different privileges, so that a high privilege
process can fork off a low privilege process to deal with a task that
doesn't need high privilege.

I note that one consequence of Squeak Smalltalk implementing threads
internally is that people have ported Squeak to "bare metal", with little
or no operating system underneath.  It has certainly made it easier to
port Squeak to a wide range of operating systems with different thread
and process models.  In the same way, Erlang runs just fine on systems
which don't _have_ shared memory.

I think there's a large area of agreement:

 - it's not a simple heavyweight process/lightweight thread dichotomy,
   there is a range of process implementations differing along several
   dimensions including memory management, synchronisation methods
   available, and privilege control

 - designing your own synchronisation code is tricky and it is easy to
   build something expensive

 - languages which can _enforce_ separation between threads make it
   easier to use threads.

 - the original Java approach of plastering 'synchronized' around a lot
   puts a lot of overhead on the usual cases that don't need it

 - synchronisation overheads can mask any process/thread gains.