A thought on concurrency
Tony Garnock-Jones
tonyg at lshift.net
Mon Dec 20 00:16:10 PST 2004
The current interpreter uses some global variables. Also, the way
dispatching is done mutates maps and methods inplace while performing a
search. Neither the globals nor the map/method dispatch-helper-tables
are thread-safe.
The map/method tables can be made thread safe by replicating them a
fixed number of times - where today we see
struct Map
{
...;
unsigned long int visitedPositions;
unsigned long long int dispatchID;
};
and
struct MethodDefinition
{
...;
unsigned long long int dispatchID;
unsigned long int foundPositions;
unsigned long int dispatchRank;
};
we might have, in future
struct Map
{
...;
struct {
unsigned long int visitedPositions;
unsigned long long int dispatchID;
} dispatchHelpers[NUM_CPUS];
};
and
struct MethodDefinition
{
...;
struct {
unsigned long long int dispatchID;
unsigned long int foundPositions;
unsigned long int dispatchRank;
} dispatchHelpers[NUM_CPUS];
};
and then allocate a pseudo-CPU number to each thread at thread startup.
A fixed number of threads would be started at VM startup, one per
pseudo-CPU, and no more would ever be created. The number of pseudo-CPUs
would be fixed at VM compile time. Every time a thread's
garbage-collector runs, it resets the contents of the dispatchHelpers
structures for the objects it moves, and just before returning to the
mutator, resets currentDispatchID.
Some of the global variables are a little more challenging.
delegationStack is currently declared as "ObjectPointer
delegationStack[256]". This is easy - simply use the NUM_CPUS trick, and
change it to "ObjectPointer delegationStack[NUM_CPUS][256]".
CurrentMemory is used in some interesting ways. It might be worth
thinking about making it thread-local, but I haven't explored the idea
too much.
methodCache - this is the interesting one. Hmm, actually; or so it
seemed this morning - on reflection, I guess having a separate
methodCache for each thread isn't such a bad idea. Certainly once we get
PICs the issue vanishes completely.
Making a change to introduce NUM_CPUS concurrent versions of these
metalevel temporary structures affects the garbage collector and the
image save/load mechanism. If the image writer and reader stayed the
same after introducing the NUM_CPUS idea, we'd end up with images that
couldn't be read by a VM with a different NUM_CPUS than the VM the image
was produced by.
The current image save/load routine is simple and elegant, in a
bit-shuffling kind of way, but since there's such a large conceptual
overlap between the collector and the object-extraction routines,
perhaps it's coming time to start thinking about using some kind of
garbage-collector-copier-based SmartRefStream analogue instead of a raw
heap dump? (Fixes endianness issues, too... although there's no reason
that couldn't be done in adjustAllOopsBy right now)
So, just a few thoughts I had this morning, anyway.
Tony
--
[][][] Tony Garnock-Jones | Mob: +44 (0)7905 974 211
[][] LShift Ltd | Tel: +44 (0)20 7729 7060
[] [] www.lshift.net | Email: tonyg at lshift.net
More information about the Slate
mailing list