HLL Process Model

Jecel Mattos de Assumpcao Jr. jecel@lsi.usp.br
Tue, 7 Mar 1995 18:32:20 -0300


On Mon, 6 Mar 1995 22:06:09 -0800 (PST) Chris Harris wrote:
> As development seems to be progressing on the HLL front, I thought I'd
> again re-interate some suggestions on the process model, and throw out
> a few new things too.

I think the two thing are linked, but most people would disagree.

> They say most CPUs are idle most of the time.  So perhaps on a good
> day, a network with 500 CPUs may have 300 in use--just 60%
> utilization.  As CPUs cost a good deal of money, TUNES should avoid
> this situation and provide the highest CPU utilization as possible.
> (90% seems like a good amount.)

This is the major research topic at Laboratorio de Sistemas Integraveis,
University of Sao Paulo ( and many other places ) software for
workstation "clusters".

> How, with so few processes and so many processors, can TUNES possibly
> near the 90% utilization?  My suggestion is to dump the big, ugly,
> linear process model in favor of something more flexible and scalable.

Exactly.

> In real life, fine-grained objects don't require big, fat processes
> to do their operations, so why should they in a computing environment?

The problem is that most people want to run "dusty decks" in parallel,
not write new fine grained programs. I can't imagine why seeing how
bad these old programs are. You'll have to rewrite them anyway to
make the GUI generation even look at them ( like colorizing old
movies ;-).

> In my view of things, code should have some way of being represented, so
> that it is easy to distinguish code that must be executed linearly from
> that which can be executed in parallel.  (No judgements at this level
> would be made about what SHOULD be done in parallel.)

You might do it this way. An alternative is to have the compiler look
at the data flow and figure this out for you. Much harder, but easier
for the programmer.

> TUNES, then, would constantly keep an eye out for how much of the CPU
> is currently being used.  This could be done by AI techniques, genetic
> algorithms, whatever.

Another major topic at LSI-USP: they call it "self-scheduling". An
interesting alternative is called "continuation stealing": you act
as if you were doing thing sequentially and just mark potential
"forks", but if a processor becomes idle it will "steal" the return
address and change the potential fork into a real one.

> One possibility would be to imbed linear and non-linear code embedded within
> each other.  For example:
> 
> VARIABLES
>   i, j
> 
> BEGIN PARALLEL CODE
>   beep
>   BEGIN LINEAR CODE
>     i = 0
>     UNTIL quitting
>       BEGIN PARALLEL CODE
>         i++
>         j = random number
>       END PARALLEL CODE
>       print i-j
>     END UNTIL
>   END LINEAR CODE
> END PARALLEL CODE

Here is how it would look in Occam:

INT i, j :
PAR
  beep()
  SEQ
    i = 0
    WHILE not.quitting
      SEQ
        PAR
          i = i + 1
          j = random ()
        print(i-j)

Maybe Occam should be added to the list of languages to study. It has
been some years since I've programmed in Occam so the WHILE syntax is
no doubt wrong. In fact, this wouldn't work because Occam doesn't
allow shared variables and not.quitting is one ( i and j are not! ).

> This is a rather useless program.  Basically, it's function is to beep
> once, and then display the result of the subtraction again and again.
> The important part, though, is not what it does, but how the PARALLEL
> and LINEAR specifiers work.  All code between begin and end PARALLELs
> will function if its members are not executed sequentially.  All code
> between LINEARs will be executed "normally".  This does NOT mean that
> all code between PARALLELs WILL be executed non-linearly; it simply
> means that such a possibility exists, should the OS find it
> worthwhile.

You specify the maximum parallelism that can occur at runtime. One
big problem with this example is that is doesn't use any communication
( other than the implicit shared variable quitting ) which is the
real complication of parallel systems.

> How does the OS tinker with a process?  Well, if utilization is too
> low, it will break off one (or a few) of the sub-blocks in a process,
> and spawn them as seperate processes.  If the machine is being
> over-utilized, it will combine existing processes together.  (You
> could just leave them seperate, but maybe that'd have more context
> switch overhead.  I dunno....)  Certain processes should also be able
> to specify attributes that they have, so that they might take
> advantage of any special hardware availible.  For example, if there is
> hardware availible for the speedy execution of celluar automa
> programs, there is no reason the process, without much of its
> knowledge, should be able to take advantage of such.

None of this is very easy to do. But they are very interesting
research problems that I am looking into. I plan to use the
adaptive compilation technology in Self to help here.

> If we support changing of a process' code on the fly, we can then view
> the OS as one huge process, composed of manys smaller things....

That is one way of looking at it. You will have to decide if your
"prcesses" can share memory or not. That is a major question.

> The main conceptual problem I have with this model is how to apply
> quotas and such, when process are constantly growing, shrinking and
> migrating.  I guess how many processes a user is running now could be
> a factor in determining which sub-process to spawn as a seperate process.

My current idea is to give users "accounts" with a certain credit flow.
You can spawn more processes, but they will share the same account and
won't be able to hog all of the CPUs in the system.

> The other alternative I have to this is the cell model I proposed.
> (Remember that these are NOT the same cells that Mike proposed.) The
> basics of what I said:
>   -Basic unit of processing is the cell.
>   -Cells are composed of an internal group of sub-cells.  
>   -Cells are connected via one-directional channels.
>   -There are a few primitive cells, that allow for the execution of
>    low-level code, such as math.

Good. Have to stop infinite recursion before it bites you :-)

>   -Cells have no internal storage; they get all their inputs from
>    other cells.  (So we maybe need some variable primitives.)

Oh yes they do! Their memory is the channels connecting their
sub-cells...

>   -The OS can roll sub-cells into their own, first order cells, to
>    achieve higher CPU utilization.  It can also roll a number of
>    interdependent first-order cells to be sub-cells of another
>    first-order cells, if resource utilization is too high.  (This
>    happend at all levels, too.  2nd-order cells, for example, can be
>    merged to form 3rd-level cells.)

There is a missing element here - cell sharing. Can you use a cell
as a sub-cell of two different higher level cells? If you can, you
will have to find a way to deal with multiple "instances" of a cell.

> I'm not sure if the overhead involved in this would be really gross,
> or if it would be barable.  Actually, that might also be a problem
> with my first model too, depending on how it was implimented.  Any
> ideas?

Try writing the above example with cells. If you can't do it, you'll
know you are in trouble ;-) If you can do it, try doing this with
cells:
           fact (n) = n < 2 ? 1 : n * fact ( n - 1 )

> So now that we've taken that conceptual leap, why not take it a step
> further, and have resizable, self-enclosing objects, and view the OS
> as one huge object, composed of many sub-objects?  Sure, there are
> logical 1st order objects, but the actual physical storage should change 
> on the fly, so that every object, be it 40 gigs or 20 bytes, should be able
> to best utilize mass-storage resources.

If you do object composition by "enclosing", how do you share things?
Or am I missing something here?

> Maybe there would be some way to combine processes and objects into
> one concept, as they have sort of done is BETA.  That'd certainly be
> neat, but the two are probably seperate enough that it might not be
> worthwhile.

I have done so in Merlin: one object = one process ( except when
they are not :-). This model is close to the real world where each
component works in parallel with all of the rest. I have to allow
some exceptions for performance ( immutable objects can have many
processes ) and for compatibility with sequential code ( recursion
would always deadlock otherwise ).

> Wow, there's a mouthful!  I think I'll throw this out now, and bar any
> further delay....

I'll follow this advice.
-- jecel