General OS design

Raul Deluth Miller rockwell@nova.umd.edu
Fri, 16 Dec 1994 13:50:08 -0500


Francois-Rene Rideau:
   The basic heuristics are simple:
   * when the machine is loaded, it looks for some machine that is less loaded

Hopefully, it doesn't look very hard.  A better allocation of
resources is that a lightly loaded machine should look for things to
do, starting with nearby neighbors.  If everybody is busy, this kind
of overhead goes away.   There will be some slows and overhead in
getting the work off the system, so let's only do it where it's
meaningful.

   * the farest and the slowest the link, the higher the link cost.

Heuristically, we look "nearby" first.

   * before you actually migrate, you allocate the resources, so that no
    uncontrolled massive migration waves are done on systems.

I think that being capability driven will do away with the need for
this kind of hack.

   * systems are (dynamically) hierarchically organized with each time
    more global resource servers.

I'm not sure of the significance of this one.

   * the more important the cost, the surest you must be before you
    actually move.

This has something to do with the criteria the loaded machine uses
when selecting work to offload to a eager helper.

   * only profiled objects (which can be done dynamically and/or
    statically) may be migrated, so that we may compute the effects of
    migration on load (because migration involves both pointwise and
    continuous communication bandwidth).

I think this one needs more analysis before we can talk about specific
policies.  Basically, we're going to be involved in protocol design,
which is rather different from the kinds of work I've seen discussed
on this list.

   * you try to migrate objects that communicate heavily to machines
    that would reduce communication overhead.

This looks like more protocol design.

For a large scale system, there need to be some objects which tackle
issues like getting things going, spotting/clearing problems,
optimizing communications, configuring new machines, etc. etc. etc.
Ideally, at least a third of our work is going to non-management work
(which is a lot better than you can say of lots of single-user
systems).

   * you try to migrate objects that communicate lightly to machines
    less loaded.

selection criteria on loaded machine -- in general the lightly loaded
machine needs to give a list of capabilities, and the heavily loaded
machine hands it something if there's some kind of fit.

   * you try to migrate objects to places that fit best its needed
    resources (e.g. inactive objects to disk, very active objects to
    fast hosts).

protocol design, management
    
   * never migrate an object to somewhere it wouldn't.

selection criteria

   * you don't migrate an object in an unsecure place.

selection criteria

   * you don't accept unsecure migrating foreign objects.

lightly loaded machine can reject work.  [Raise a red flag when this
happens -- it indicate's something's gone wrong.]

-- 
Raul D. Miller          N=:((*/pq)&|)@                 NB. public e, y, n=:*/pq
<rockwell@nova.umd.edu> P=:*N/@:#               NB. */-.,e e.&factors t=:*/<:pq
                        1=t|e*d    NB. (,-:<:)pq is four large primes, e medium
x-:d P,:y=:e P,:x                  NB. (d P,:y)-:D P*:N^:(i.#D)y [. D=:|.@#.d