A thought on concurrency

Mon Dec 20 00:16:10 PST 2004

The current interpreter uses some global variables. Also, the way 
dispatching is done mutates maps and methods inplace while performing a 
search. Neither the globals nor the map/method dispatch-helper-tables 
are thread-safe.

The map/method tables can be made thread safe by replicating them a 
fixed number of times - where today we see

struct Map
{
   ...;
   unsigned long int visitedPositions;
   unsigned long long int dispatchID;
};

and

struct MethodDefinition
{
   ...;
   unsigned long long int dispatchID;
   unsigned long int foundPositions;
   unsigned long int dispatchRank;
};

we might have, in future

struct Map
{
   ...;
   struct {
     unsigned long int visitedPositions;
     unsigned long long int dispatchID;
   } dispatchHelpers[NUM_CPUS];
};

and

struct MethodDefinition
{
   ...;
   struct {
     unsigned long long int dispatchID;
     unsigned long int foundPositions;
     unsigned long int dispatchRank;
   } dispatchHelpers[NUM_CPUS];
};

and then allocate a pseudo-CPU number to each thread at thread startup. 
A fixed number of threads would be started at VM startup, one per 
pseudo-CPU, and no more would ever be created. The number of pseudo-CPUs 
would be fixed at VM compile time. Every time a thread's 
garbage-collector runs, it resets the contents of the dispatchHelpers 
structures for the objects it moves, and just before returning to the 
mutator, resets currentDispatchID.

Some of the global variables are a little more challenging.

delegationStack is currently declared as "ObjectPointer 
delegationStack[256]". This is easy - simply use the NUM_CPUS trick, and 
change it to "ObjectPointer delegationStack[NUM_CPUS][256]".

CurrentMemory is used in some interesting ways. It might be worth 
thinking about making it thread-local, but I haven't explored the idea 
too much.

methodCache - this is the interesting one. Hmm, actually; or so it 
seemed this morning - on reflection, I guess having a separate 
methodCache for each thread isn't such a bad idea. Certainly once we get 
PICs the issue vanishes completely.

Making a change to introduce NUM_CPUS concurrent versions of these 
metalevel temporary structures affects the garbage collector and the 
image save/load mechanism. If the image writer and reader stayed the 
same after introducing the NUM_CPUS idea, we'd end up with images that 
couldn't be read by a VM with a different NUM_CPUS than the VM the image 
was produced by.

The current image save/load routine is simple and elegant, in a 
bit-shuffling kind of way, but since there's such a large conceptual 
overlap between the collector and the object-extraction routines, 
perhaps it's coming time to start thinking about using some kind of 
garbage-collector-copier-based SmartRefStream analogue instead of a raw 
heap dump? (Fixes endianness issues, too... although there's no reason 
that couldn't be done in adjustAllOopsBy right now)

So, just a few thoughts I had this morning, anyway.

   Tony
-- 
  [][][] Tony Garnock-Jones  | Mob: +44 (0)7905 974 211
    [][] LShift Ltd          | Tel: +44 (0)20 7729 7060
  []  [] www.lshift.net      | Email: tonyg at lshift.net