A more generic portable binary encoding format

Mike Prince mprince@crl.com
Mon, 12 Dec 1994 13:38:57 -0800 (PST)


BTW, one of our first applications should be an e-mail program which 
allows you to "fork" a message into several threads!  (I'm just kindof 
kidding :)  It's becoming difficult responding to these long threads, 
Johan makes some really good points to which Fare has responded.  I'd 
love to just respond to Fares, but, did he include all of Johans message?

just my .02


Now on to battle...

On Mon, 12 Dec 1994, Francois-Rene Rideau wrote:
> >>    That's why the LLL may be necessary, but only as a part of a more generic
> >> effort to allow inter-computer communication of objects.
> > 
> > Absolutely. So let's define our primitives NOW, and stick to them. When I 
> > say primitives, I really mean that! Nothing that we don't absolutely NEED 
> > on the low level should be present there. Throw out all unnecessary rubbish.
> 
>    Beware: what may ne unnecessary to most may be indispensible to some; and
> most may darnly need things to most others deem unnecessary.
>    So, we must separate what's useful, from what's just expedient; we must
> keep in the "kernel", the "grammar", generic constructs, and put in the
> "library", the "vocabulary", specific primitives.

Johan and Raul both point in the direction of defining our primitives, 
along with some good strategies for doing that (KISS, "throw out what you 
don't need", map it to the hardware, etc).  The primitives are 
implemented by the kernel, and should be with us forever?(!).  Everything 
else builds on these.  I get the feeling the grammar and generic 
constructs Fare is talking about is more OO and aimed at bridging the HLL and
the kernel.  That may not be possible and may actually cloud our judgement if
me mandate it.  As was previously mentioned (Johan?), heterogeniety has 
its place.

> > So, what primitives do we need? I would suggest:
> 
> > A. "Passive items" -
> > 	1. constants
> > 	2. variables
>   These are some very basic generic constructs to me...
> Perhaps we may also add monotonic variables; but I think it may and should be
> put elsewhere...
>   And why do you call it "passive" ? Functions can be constants or
> variables !!! Being constant or variable has *nothing to do* with being
> active or passive...

I don't want to put words in Johans mouth, but, passive items do not act 
on others.  A variable (passive item) does not modify others, code does...

I believe by infusing functions and constants into the argument just 
confuses the issue unnecessarily.  I think we all understand Johans point.

> > 	3. blocks of memory

>    All computers do not have a linear memory model (see hardware LISP machines,
> or software virtual machines of any kind that may be the underlying system).
>    That's why I think that memory blocks are some very low-level thing whose
> use should be decided by the optimizer (whether human or computer), not the
> high-level programmer (sometimes the same human/computer, though); but in
> concertation with him (high-level annotations may help the optimizer decide).
>    I'm not saying that our LLL won't provide these, but that it shan't be
> a part of the grammatical kernel of our system, just one of the lowest-level
> (and very important) of the standard vocabulary (library).

Again, you're mixing up the library part with the kernel part.  One 
part of the kernel says to another, I need some mem, give me a block.  
That's pretty fundamental.

> > 	5. 32 bit integers
> > 	6. reals
>    These are fine; but should really be included in the standard library, not
> the kernel: people may wanna use any kind of integers or reals, with any size,
> fix or floating point, etc. On the other side, the computer has got a little
> number of different builtin number sizes. We should allow transition from one
> to the other, thus support both approaches. There should be generic *library*
> routines to handle these...

Again, what it boils down to is you need something at the very bottom 
(read kernel).  And we need to agree on these.  You can't create a 
bottomless argument (we'll leave the word sizes up to the compiler) in 
order to not trap yourself into a compromise situation.  Same again for 
library vs. kernel.  These are kernel issues and cannot be floated on top 
of the compiler. I completely agree with Johan about establishing 
primitives (although I have my own slightly different version =).

> > 	4. pointers
>    Here should be the basic abstraction of the low-level side of our system.
> The low-level begins with pointers. Local pointer size is host-defined;
> pointer size inside a file may be file-defined (or sub-file defined).

Nope. Remember we're talking LLL.  Your HLL may create as many 
abstractions as it wants until it runs out of memory or is blue in the 
face.  But we're talking fundamental/primitive instructions here!

I'm all for nice little pointers!

> > B. "Active items"
> > 	1. verbs
> > 	2. "user defined functions" (I will call these "friends")
> > 	3. "objects"

> I see no clear distinction between these...

Again (again again again), I believe your mixing HLL with LLL.  The HLL 
may be able to view all of these together, but to get something off the 
ground, we need to establish our basic command set (verbs).  A friend (if 
I'm reading this correctly is a chunk of code/function/method whatever, 
and an object is a convenient way of grouping these together.  Am I on
target Johan?

Specialisation and functionality on
> > 		different levels are very dissimilar. Why? Because this
> > 		heterogeneity WORKS! I think that if we religiously stick
> > 		to a monomorphic design philosophy (eg "objects at almost
> > 		all levels" we are making a terminal mistake.
> I completely agree there is heterogeneity; what I want is smooth interaction
> and communication between different objects, and eventuality of a direct
> access between any indirectly linked universes.

That may not be possible.  Remember everything goes to machine code and 
runs with the kernel.  All that matters is that the kernel provides 
services for different processes to communicate.  It doesn't make sense 
to stipulate a development environment (say MOOSE) to HAVE TO be able to 
talk natively with a VBX application.  The LLL/kernel provides the 
functionality, use it who may like it!

> > Now, back to my types..
> > 
> > Integers and reals are self-explanatory.
> > Constants would come in two flavours:
> > 	a. Universal to the whole system, and accepted by all:
> > 		- ASCII CHARACTERS
> > 		- a wide variety of symbols eg the Greek alphabet,
> > 			characters with accents and special characters
> > 			eg. French, Polish, Serbo-Croat..
> > 			even Kanji..
> > 		- a dictionary of the commonest 64k English words
> 
>    Let all these kind of things should be library object, not unremovable
> elements of the system !!!
>    Various users or implementations may require or provide different languages,
> with their own encodings (uncompressed, compressed, etc) and dictionaries.

I've played with the same idea Johan has, but in this case constants 
should rest just a wee bit above the kernel.  It could be part of a 
standard library/macro/extension package.  I totally believe these things 
should be addressed early on in the development, but after the kernel is 
running.  BTW, this is a COOL idea and should be done :)  

How do we get around ASCII in the kernel for naming things?  I'm just 
saying a name is a length delimited string of bytes.  Put in ASCII if you 
please.

>    In the library vision, each object (even if multiple copies are around the
> world) are uniquely tagged and PGP-signed (err, PGP is recommended but not
> necessary; other protocols may be supported). Thus, you just require some
> globally identified (more or less distant) object and there it is ! For distant
> objects a copy will be made (which is why constant objects are preferred to
> variable ones: because no automatic update process is needed, not to talk
> about synchronized access or modification...).

> > Variables would be locally defined and could be allocated ANY value.
>   Let's have variable scoping like in *any* good language (i.e. not C), with
> more or less local things; but what imports is: if you "see" an object, you
> access it, whether it is implemented as local, remote, distributed, or
> whatever...

Here's my little speech about scoping;  I propose we have the union of a 
method and data.  If you want the data, the ONLY way to get it is to ask 
the method.  Methods are named and are listed in the namespace.  This way 
we don't need complex scoping rules, permissions, etc on the LLL/kernel 
layer.  (a method may be huge and have hundreds of variables arrays, etc 
in its scope, or it may be 20 bytes of code and an integer).  There are 
no pointers to distant objects.  If you want something from a far away 
object you send a message to its method, it pulls the data, and you get a 
message back (all using agents of course =).  Now all we need is for the 
kernel to support message passing/agents.  KISS!

> > Blocks would be collections of words of a requested size. They could 
> > occupy RAM or disc space or whatever .. this would be transparent to the 
> > user,  and management would be on a very low level, BUT you would be able 
> > to examine / acquire performance attributes (How long does it take to 
> > move data from block X to block Z). Notice how I said collections of _words_.
> > This adheres strictly to my concept of having everything as 32 bit words, 
> > and is vital to ensure the integrity and simplicity of the model.
> 
>    Firstly, the user shouldn't even see blocks -- this should be transparent !
> Do word-processing secretaries manipulate blocks ? No, they manipulate
> formatted text documents. Do mathematicians manipulate blocks ? No, they
> manipulate symbolic expressions. Do number-crunchers manipulate blocks ?
> No, they manipulate reals, matrices, functions. Nobody wants to use blocks
> any more than they want to explicitly save documents (imagine a
> "Save the documents !" or "Save the matrices !" association...).
>    Blocks are fine for implementors, and that's all. When I use the system,
> I never ever wanna see blocks or pages. Let the system hackers have all the
> fun, and not disturb *unwilling* users with implementation dependencies that
> only annoy them (if they're willing, that's another problem: we *must* provide
> those ones access to it, as long as it is secure).

Separate the users from the progammers (see previous thread).  A block 
will be a system primitive, users won't see it!  Maybe even HLL 
programmers won't see it.  You can't use those arguments against a 
low-level primitive.

I'm for blocks, but a slightly different version which we'll discuss later.

> > Pointers would be just that - pointers to offsets within a particular 
> > block, and again, entirely generic (not giving a damn whether the "block 
> > is composed of RAM, disc memory, or whatever) but susceptible to detailed 
> > timing analysis.
> > NOTE THAT POINTERS ARE IN EFFECT NUMBERS, AND CAN THUS BE MANIPULATED AS 
> > NUMBERS - THIS SIMPLIFIES OUR TASK IMMENSELY.
>    No, No, No, and yes.
>    I mean, surely the internal representation of a pointer in a given host
> will be an integer; but let's just not specify that; it will come immediately
> on machines with flat memory, but other solutions will fit much better for
> other architectures (lisp machines; big multiprocessors with tiny processors,
> etc).
>    You gain *nothing* at over-specifying things about pointers for portable
> code; nothing. Let the human or computer optimizer do this for the particular
> architecture the code is executed on.
>    In the meantime, let's have abstract operators in the only case where
> pointers and integers actually interact: arrays. Inlining them is quite easy.
> Not mixing integers and pointers is very simple to explain: that's the only
> way to avoid crashes that come from synthetizing pointers with integers...
> Surely, you gain nothing at doing the latter, but unportable hacks (which
> the human/computer optimizer can do itself, with or without your help through
> annotations).

The kernel does not understand abstract operators.  Maybe it would 
simplify things if we replaced every occurance of kernel/LLL with "Our 
new microprocessor, the 80686".  We need to settle on compromises, we 
cannot skirt the issue by saying keep it abstract.  A pointer is an 
integer that points into a block.  Fine with me!

> > Now we move onto active items..
> > 
> > There are three that I consider necessary:
> > 
> > 1. Verbs. These are _simple_ low-level operators eg ADD, SUB, GET (from a 
> > pointer into a variable), PUT (into a pointer location) etc.
> > 2. "Friends". These are user defined amalgamations of verbs, constants, 
> > etc. They can be used locally, but if you want to transport them to other 
> > users or systems, you must package them up into:-
> > 3. Objects (modules or whatever). These should preferably be compiled.
> 
>    Let's have some LLL-dependant instruction set, undefined at high-level.
> This way, we allow further change of LLL, or using any kind of language
> (particularly assembly, or Scheme, or the SELF virtual machine, etc) for a
> LLL (if you're masochistic enough, you could use plain C as a LLL).

Nah.  The LLL is what glues everything together.  I'm in Bangladesh (Year 
2005) using some leftover Pentium system running tunes v1.0.  I've got 
all my little objects humming along together (remember everything boils 
down to LLL) when I get an internet++ hookup.  All my little objects jump 
for Joy as they realize their ability to migrate onto a nearby SparcStation
101010.  Alas, because we did not standardize on one LLL instruction set, 
they are incompatable.

Don't let this happen to you!

>    And why have a hierarchical system at all ?
>    Let heterogeneity appear *naturally*, from the fact that we have atoms and
> constructors. In our standard file format, there should be a constructor for
> modules; but any language expressed therein can provide its own constructors
> for its own kind of modules, etc...
>    As for packaging objects to move them, that's another (quite important)
> issue, linked to that of having global (world-wide) identification mechanisms,
> and good algorithms/heuristics to determine the limits of an object.

Ca depend.  We're mixing high and low again.  All the kernel needs is a 
way of receiving a chunk of bytes, and passing them off to a decomposer, 
to make sense of them.  Probably the decomposer should be part of the 
kernel, breaking out our methods and data and giving the agents in the 
packets some execution time.

Want another layer on top of that, add your own HLL flavor.

The inter-kernel packet format is very simple.  You can build anything 
on top of that you like, but don't impose HLL constructs/desires on what 
is a simple protocol for transfering info between kernels.

> >> Also note that we may also change the format, if one day we find
> >> ways to considerably enhance it by using uncompatible techniques. 

> > Aaaaaaaaaaaaaaaaaagh. Why not just get it right first time, and leave lots
> > of room for extensions!!

>    Of course, that's what we're doing. But imagine that we further see that
> with another format, we can skip 5% space overhead, and 20% time overhead;
> shouldn't we move from one to the other ? I mean, I hope this won't be the
> case; but we should stay open-minded enough for any eventuality.

Nope, I'm with Johan.  Get it right the first time.

> >>    Another basic constructor may be the (implicit of explicit) choice:
> >> "choose whichever of these you please". So for example, some executable
> >> game code may be given in multiple format (i386, M68K, PPC, sun4, 6502
> >> or any assembly; or our LLL, or another LLL, or the new version of the
> >> LLL, or TAOS' LLL, or ANDF, or even high-level language code !), and
> >> the system will choose whichever fits best for speed and/or security.
> >> Explicit choice is just the standard tuple/record construct.
> >>
> > Clumsy, potentiality for _big_ overheads, and generally to be avoided.
>    I don't agree. That's just having a generic support for choice in the
> system; if humans choose, that's a browser or interaction system; if the
> system chooses that's a heuristic or any program.

Fat binaries encourage obesity.  I'm for a pure LLL distribution.

> >>    Here we see that the semantics of the Low-Level Object Encoding Format
> >> (LLOEF -- please try find a funnier acronym) are deeply related to those of
> >> a high-level language for the system. Actually, the semantics should be
> >> *the same*, and the LLOEF *is* the standard implementation of the HLL.
> > 
> > The low level & hi level are intimately concerned with one another. Screw 
> > the LL and you can forget about the HL ever working properly..
>    You may screw a language by over-defining things as well as by
> under-defining them...

Then let's choose wisely Grasshopper...

> >>    Now, what about the LLL ?
> >> Well, we saw that a LLL is only *one specific* way of encoding
> >> low-level code in a portable manner; people may choose whichever
> >> available LLL they please;
> >> but of course, we'll provide the best one ever to be (-8, won't we ?
> >> A "same" LLL may come in multiple kind of flavors (e.g. Mike's LLL with
> >> 16, 32, or 64 bit stacks).
> 
> > Yeeargh. KISS. See my previous comm.
>    Simple is to let undefined, not to overdefine. (what was the
> other "S" of KISS already ?)

Actually I like the other version of KISS;
	Keep it simple studid.
It forces us to recognize we are not the GODs of PROGRAMMING but are mere 
mortals.  Do it simply and do it right.

Do not overextend our selves and fail to accomplish anything.

Mike

P.S. I'm still working on my kernel.  I fear releasing details would 
create a wrath of "that part sucks because it does" messages.  Our group 
has discussed a number of good ideas, but ZERO code.  The proof is in the 
puddin'.  I'm making a batch, and if you don't like it, I encourage you to 
use my recipe and improve it.  Then we WILL be making progress!

Too bad "all talk, no code" doesn't rhyme!