Lets squeeze it!

Alan Grimes alangrimes@starpower.net
Sat, 18 Nov 2000 10:55:38 -0500


Okay, inside the modern RISC CPU there is a stage in the pipeline where
the raw *instruction stream* is examined and then a *Computation* is
constructed based on it. This computation is like a 3D rendition of the
linear instruction stream that takes into consideration data hazards,
and other dependancies and results in a set of commands that then
executes in the function units with optimal paralellization. 

A compiler for such a CPU will look at a piece of code that can be
described as follows: 

a REQUIRES b, c, d. 
DO a.

The compiler would see this and then compile the code in such a way to
give hints to the processor as to how things can be done in paralell. 

b > instruc.
c > instruc. 
d > instruc.
b > instruc.
c > instruc.
d > instruc. 
b > instruc......

Okay. By giving these clues the compiler can give the CPU a better
opportunity to make things go faster, as there are far fewer data
hazards within the instruction stream. This is better than generating
code like: 

DO b.
DO c. 
DO d. 
DO a.

Which limits the CPU's ability to paralellize because the instructions
within b, c, and d are necessarily linear... Today's CPUs get around
this to some extent by generating a graph of the computation which
allows it to do some things out-of-order. 

Naturally this solution is limited as it does not allow flexability in
execution. If a Function e were written that requires b, c, and f then
the compiler would have to generate a second optomized block for that

The operating system for this machine running a dozen or so programs
will make decisions as to how these programs will share the CPU time
available. Devices will neccessarily run exclusively. Realtime tasks,
very much like devices, will be given time slices that allow them to do
what needs to be done. And finally the remaining programs will be run
whenever they are "available" to run, meaning that they will run when
they are not blocked. Hopefully Deadlock, Starvation, and race
conditions can be avoided successfuly.
	At all times the OS will attempt to ensure that no resource is

Just the other day I learend of a system called PBS or "Portable Batch
System" that caches jobs for large-scale paralell multicomputers, or
even singlehost machines that have a lot of work to do. This system
works by observing resource utilization throughout the system and then
launches tasks as the resources to execute them become available. 


Now the question here is wheather it is possible to squeeze all these
very similar systems togeather into one recursive function, accelerated
by the processor, that solves these very similar optomization problems? 

Ofcourse I am saying that you would start the function at 4 which would
then call itself as needed at 3 and soforth... 

One of the key steps to making this happen is to remove these
unneccessary high-level abstractions, making the system flater and more

I suspect that it is significantly easier to become a proficient brain
surgeon than a linux user.
http://users.erols.com/alangrimes/  <my website.

####### Begin Eschelon Block #######
unabomber anthrax plutonium militia delta force ruby ridge atf batf waco
oklahoma city assault rifle sog sof m-16 clinton marx crack m-60 c5 c7
mlk panthers FBI chemical weapons twa 800 roswell terrorist freedom