A more generic portable binary encoding format

Francois-Rene Rideau rideau@clipper
Sun, 11 Dec 94 19:46:36 MET


   There's been a lot of discussion about the low-level language for our
system.
   Of course we need such language, and it's a big part of our project.
Now, I think that it's only a part, and not the major one: code is not
everything, especially low-level one. What the user sees, what we all
want to use, is high-level objects. A low-level encoding is meant to
be transparent; that's why it's left to system programmers who should set
it up once and for all (well, until they redo it much more efficiently):
so that users won't have to bother.

   Now, as I said, not only low-level running code is necessary in the
system: we also need generic high-level code (that sometimes *cannot* be
compiled efficiently to LLL, and may require a later compile and/or
interpretation). And we need data: numbers, text, graphics, sound. But
also data constructors and abstractors: arrays, lists, associations, sets,
relations, distributions, trees, syntactic trees, graphs, lambda-expressions,
etc, etc. Actually, we need any kind of high-level object.
   That's why the LLL may be necessary, but only as a part of a more generic
effort to allow inter-computer communication of objects.

   A given host may have its own optimized format for in-memory or even
disk objects; a given parallel multi-CPU system may have its own optimized
communication protocol. A local net may have its own optimized (or particular)
object format for specific applications. But we *must* provide some generic
communication format, so that machines all over the world may understand each
other.
   This format must include *everything* possible, or to be possible one day.
* It must be extensible: new encodings may be dynamically added to the format,
by just lazily calling an external module (that will have to be present for
the object to be decoded); see that extensibility *is* power: just *anything*
should/may be directly expressed using this (extensible) format.
Also note that we may also change the format, if one day we find
ways to considerably enhance it by using uncompatible techniques. 
* It must be easy to deal with: being used heavily, encoding/decoding
speed is important.
* It must be compact: for the same reason, size is important; so it must be
a binary format, with recommendations for optimized text encoding; and it
should support any kind of possible object compression,
through the natural use of the (recursive) module system.
* It must be portable: recommendations exists so that the encoding is
independent on the size or format of words in the architecture.
* It must be secure: a signature system may allow to identify the authors,
trustees, trusters, of a module, so that only trusted modules may be actually
evaluated.
* It must be type-safe: any kind of typing can be supported by the format,
from the simplest one (no check) to the most complicated one (check of
program proof). Type modules are thus available; use only objects you
trust.


   As I see things, objects are communicated by group, let's say *modules*.
Actually, it's even simpler to say that's there's objects are communicated
one by one, *but* that there's a generic way to encapsulate multiple objects
into only one, which is known as a module (or generic object multiplexer).
   A module is thus a (uniquely identified) object, that interacts with
other modules by requiring or providing sub-objects. The basic constructor
of a module may be the lazy evaluator, with explicit or implicit arguments:
you ask for arguments with such properties. Implicit ones are given by the
system (e.g. module asks: "gimme a .gz decompactor"; system replies: 
"ok, here's gzip 1.2.4"). Explicit ones are explicitly given by the user.
Of course, the system may also ask for user configuration/interaction when
giving implicit parameters...
   Another basic constructor may be the (implicit of explicit) choice:
"choose whichever of these you please". So for example, some executable
game code may be given in multiple format (i386, M68K, PPC, sun4, 6502 or any
assembly; or our LLL, or another LLL, or the new version of the LLL, or
TAOS' LLL, or ANDF, or even high-level language code !), and the system
will choose whichever fits best for speed and/or security. Explicit choice is
just the standard tuple/record construct.

   Here we see that the semantics of the Low-Level Object Encoding Format
(LLOEF -- please try find a funnier acronym) are deeply related to those of
a high-level language for the system. Actually, the semantics should be
*the same*, and the LLOEF *is* the standard implementation of the HLL.

   Now, what about the LLL ?
Well, we saw that a LLL is only *one specific* way of encoding low-level code
in a portable manner; people may choose whichever available LLL they please;
but of course, we'll provide the best one ever to be (-8, won't we ?
A "same" LLL may come in multiple kind of flavors (e.g. Mike's LLL with
16, 32, or 64 bit stacks).


   Ok. Whatdyathinkofit ?

--    ,        	                                ,           _ v    ~  ^  --
-- Fare -- rideau@clipper.ens.fr -- Francois-Rene Rideau -- +)ang-Vu Ban --
--                                      '                   / .          --
MOOSE project member. OSL developper.                     |   |   /
Dreams about The Universal (Distributed) Database.       --- --- //
Snail mail: 6, rue Augustin Thierry 75019 PARIS FRANCE   /|\ /|\ //
Phone: 033 1 42026735                                    /|\ /|\ /