Are you still there?


Sun, 7 Feb 93 19:17:53 MET


(now february the 7th, 1993)
--------------- zap zap zap ---------------

-- here's a mail  I should have sent you --
----------- at least one week ago ---------
------ no excuse; just playing NetHack ----

(Fare->Dennis #10, 28/1/93)

 Well, Hello Dennis, and sorry for (very) long delay between messages,
 These time, weeks seem to begin on wednesday and end on friday to me.
Here is my complete zoo-fied then btoa-ed notes (most of which are in french;
ask what you want me to translate).
 I've lost my week-end to install OS/2 v2.1 beta at home (AMD386/33,8-320MB);
(sorry for intel). That was no fun, and it isn't very useful (except to run
Win* shit under WinOS2, and/or having quick diskette I/O while working).


>>>  Well, my main problem is up to now I worked in french, and all my notes were
>>> in french. So now, here is part of it translated (my ARJfied notes are 200Kb
>>> long).

(that's mostly FRENCH)
(hope you have a >80 columns display to read this mail, because of >>>'s)
(btw, do you have something to automatically add or destroy groups of
chars at the beginning of each line)(I'm using lemacs at the moment)


>>>  Here are some features I expect the system to run at high logical level
>>> (standard language programming), and that the compiler and/or the kernel should
>>> support:

First, let me recall my philosophy about high-level language & the kernel.
 At high-level, you are concerned mostly with logical structures; you let the
system choose physical implementation for you, and it is meant to do it
efficiently. You may choose to let the compiler do with as few additional
information as possible; The compiler itself will then evaluate the cost of
each known implementation technique. The lgagage
explicit data is also accepted and you can help the system with extra
explanations.
 At very high-level, everything is virtual, even the more simplest operations.


>>> * Objects are contained in Object Zones (OZ's); the simplest (logical) OZ is
>>> a set, which can be physically implemented by, say, a bitmap, or a list, etc.
>>> A more general OZ does not contain only objects, but also attributes of these
>>> objects (which depend in nature of the OZ and of the objects): for example, a
>>> room in a CAD program may contain objects, which each have its intrinsic shape;
>>> but each object has its position in the room for attribute.
>>
>> Please elaborate on this idea!  I need more details to understand exactly how
>> this might work, what it will be used for, and so on.  I think I understand,
>> but I'm not sure - maybe some examples will help.

OZ are a high-level construct, and if we want to build a single base physical
implementation for it in a first version of the kernel,

>>> * standard attributes are printing/showing format (for example, for an integer,
>>> the base hexadecimal, decimal, octal, binary, n-ary, etc; for a complex
>>> structure involving many types of number, ...); other common attributes are
>>> names, debug info, comments, translation into different languages, etc.
>>
>> I like this idea, but I think it would be more efficient to implement these
>> as standard METHODS instead of standard ATTRIBUTES.  A method only adds an
>> entry to the virtual method table of an object, where an attribute would
>> add to every instance of that object.  Of course, the object would then be
>> able to choose whether or not it wishes to implement these methods.

 At high level, methods, attributes, functions, etc, are only different forms
(more or less elegant, following the context) to express the same conceptual
idea. Perhaps in nowadays languages, which forces you to define low-level data
structures, we tend to assign one high-level concept its common low-level
representation.

>> By different languages, do you mean spoken or computer languages?

Well, BOTH, of course !
 A "comment" will be a link to another definition of the same object; it can
redundant, equivalent, or concurrent to the "main" definition of the object
(which here is the raw binary representation). There even needn't always be a
main representation; in fact, between the different ones, the system will
choose which to use with those criteria: firstly, the computer must entirely
"understand" the definition, with respect to the features actually used;
secondly, the cumulated costs of the computation of the representation and of
its use must be as low as possible; if there is a doubt, choose the more
immediate representation and/or the easiest to edit/debug for the user.
 Thus, a beautiful functional language definition may come with a portable C
version of it, plus some assemblers optimized transcriptions for the most
common porcessors on the which the code will be used and/or systems where this
critical code portion may be considerably optimized (in term of speed and/or
size, etc) when hand coded.
 With this point of view, a source program can be concurrently defined in more
than one language (and/or many times the same language) with the same (logical)
interface; the system will choose the "better" between them. Among the
definitions, you can add more or less global/local equivalents in french (or
english). A global equivalent of a module may be its explanation/description;
local equivalents inside it may be comments about tricks used here and there.
Of course, there isn't any compiler from french (or english) to computer-under-
standable code, so french (english) version will always stay a comment.
 As the language is its own preprocessor, you can also build constructed and/or
embedded comments in the main code; to allow you to write code in the order you
would in the text version of the language, just use the language linking capa-
bilities in the meta- part of it to describe the program as it should be
ordered to be understandable. You won't have entire copies of the same routines
in different languages uglily alternating, or multiple occurence of a (real)
comment in each different version; but in the "good" choosen version,
everything will come fine. A graphical interface for the language (and why
continuing basing language upon linear text, not to talk about linear ASCII
text, after all ? -- if I chose to compile source texts for the language, it
was mostly because it would be too difficult to debug simultaneously the
language and the graphical interface; this means that FOR THE TIME BEING, the
language will have to be easily expressible in text; but the next revolution in
the way of thinking will be the end of the linearity in texts imposed by the
paper media, and the use of a graphal non-linear way of linking things to
others. There is no total order in ideas; not even a partial order; there are
only ideas, everyone of which uniquely, complicately interlinked with the
others, some of which can be constructed in a certain order from some base ones,
or in another order from other base ideas).

>>> * Existence range or compute range of any object is arbitrary: any object can
>>> be defined relatively to any other; it can be computed at any arbitrary time
>>> (preprocessing, compile, interpret, run, more than one object in parallel or
>>> in sequence at one time, etc).
>> 
>> Interesting idea...I wish you could expand more on this idea also so I can
>> better understand it.

 Here are some examples:
- imagine a 3D surface visualizer, which will compute many thousand times the
result of a run-time user function. Up to now, you had to make your own parser
(or get and understand one) or call a VERY costly compiler session; neither of
these is a good solution for the developer (which does not mean it is not
interesting by itself). With our system, just define a function variable; use
the STANDARD system parser with any of its (eventually modified as you like)
PREDEFINED configuration to produce a function object from a user input.
To just debug your program, use the virtual function class. It will be slow, it
will possibly re-interpret the function value string each time you call it, but
from high-level point of view, it will be the same. To optimize, ask the
compiler to think about it (in the final version) and/or tell him precisely
what YOU think is the best (the default in the optimizing final version, is to
compare what you propose to what it computed, and warn/react somehow if it
disagrees and thinks to have found better than you).
- in a generic Discrete Fourier transform, you may want to use sinus and
cosinus tables (or even tables of complex numbers of modulus 1)
- in a large application, you may have a huge database, part of it being a kind
of index varying very slowly with respect to the rest of the db, but extremely
quickly with respect to the code; this can happen inductively with parts of the
object as small as you look close. But as current systems does not include
data/code embedding at system time, it will be up to the user not to modify the
index without uptading the db organization common run-time.
- in some numerical (or combinatorial) applications, you have heavy
computations where appears constants which simplifiate considerably the
calculus, without being themselves trivial to express generically.
For example, C & Pascal only understand as constants numbers or pointers
well defined from nothingness with only standard algebraic operations;
everything else will be computed at runtime, or must be done in another;
but sometimes, the simplifying constant is not from a base type, and is
computed with slow routines (which may be nevertheless quick compared to saved
run time); common examples are a matrix appearing as the result of basic
algebraic operations which depend on the dimension of the matrix, or a
complex precomputation (sort/coding/compression/etc) of a data base.
(exercise: find your own examples)
- in a simulation (I prefer games, but some earn more money with the military),
you may have landscapes calculated from raw data; you would like to embed the
calculation of the landscape and its use in the same program object (but you'll
put them in different modules, so to optimize partial recompile). For now, the
only solution is have a makefile to manage two (at least) distinct programs
possibly sharing common libraries, etc. The make system helps you manage
everything, but does not eliminates all manual maintenance, slices your program
in small files (which eats a lot of place on the file system) forces you to
learn yet another language (even simple) that, not being powerful enough, has
you use complex shell tools to do something which at the end won't be as well
as if the language was meta-itself, not allowing you to manage efficiently
interlinked programs, generic modules, etc.

* The key-point is: there is no clear limit dividing using, modifying, and
programming. Compiling a program may involve running others and vice versa.
You'll always need one day to adapt a little part of an object, which will have
to have been properly isolated in a module when conceived, but with the module
linked with others. If the language and the the system are identically OO-ed,
all you'll need do is cut links to partially or totally disconnect modules
from others, tie new links to filter between old modules or replace an obsolete
(or too buggy) one or add a new feature, etc. If the system is well done, this
must be efficient (quick tranfert rate, partial local compile possible for a
next version, etc). Standard libraries must help you find all known frames to
interlink object, and include for each frame enough ways of coding it so as to
get a quick compact code.

>>> * There is a virtual reference constructor which takes a pointer and returns
>>> the pointed object. If the pointer is in a constant variable, you have a
>>> virtual equivalent of physical C++ x& 's.
>> 
>> Does this mean to actually duplicate the object being pointed to?  If so, I
>> agree, this is important.  Each object would need to implement its own.
>> 
 Well, sometimes, it may be useful and/or compulsory to actually duplicate an
object, and of course, their should always be a way to code an object. You may
also choose to duplicate an object in a shared system where data flow faster
or slower between different devices, to ensure the faster transfer rate with
the shortest wait time between refresh of different copies of a file, and
security with respect to hardware failure (cable problems between a computer
and a remote system, unexpected shutdown between the high speed volatile RAM,
and the slow permanent disks, etc). The most difficult are the scheduling
problem (whose place must be reserved for a further version of the system), and
the external link problem.
 I won't talk about scheduling this time (I think this can come after; perhaps
next time). But here is the linking problem: you must be able to consider a
copy of an object, the object being nothing by itself, but only by its links.
Then, when the copy uses its link, you must know either to link to the original
linked object, or to a copy of it, etc... .
  example: (originals in uppercase, copies in lowercase, second copies with two
	   letters, third copies in reversed video, fourth copies blinking ... )
	   (same name, same object)
		A--B	a--B	aa--b   ...
		|	|	|
		C	c	c
 We may also have logically different objects, which, as long as they take the
same value, will not be duplicated.
 In fact, there are two associated problems: how to physically duplicate a same
logical object (that's distributed computing), or how to logically duplicate a
logical object (that's avoiding an exponential explosion of memory allocation).

 However, what I originally meant is that even a basic logical constructor like
the one which takes a type and returns a pointer type on it is virtual. But
there's also an implicit reference constructor, as in C++: the virtual logical
type is the same, but you must use a pointer before accessing the "real"
object. For example, if their is a virtual integer A, you tell the program how
to access A rather than giving it a raw memory place to store it. From the
logical point of view, it behaves the same, but physically, you can then link
A to any set of routines allowing needed integer operations; in particular, you
can give A a physical pointer link to a variable in another program which won't
have been published following a standard method (i.e. the variable is in
registers and/or changes place during its life; another virtual variable !).


>>> * There is a virtual pointer class, and thus virtual reference with virtual
>>> pointers. You can by this mean address logically any type of variable anywhere
>>> in the computer, etc, without changing your source program (but a big change
>>> in a variable's nature may cause big change in optimised object code). There
>>> may be a virtual type of pointer where the data isn't really read and written,
>>> but given as parameter to look and modify procedures. The fact that what you
>>> read isn't what you wrote is similar to the fact a variable may be shared
>>> between procedures. (practical example: ln -s = symbolic link under u*ix).
>> 
>> Please elaborate on this one also - it sounds similar to what I've proposed
>> below....

   See previous paragraph.

>>> * There is the standard recursive question in OO environment:the class class's
>>> class is the class class (itself), unless it is only a subclass of class.
>>> Well, doesn't matter if the Kernel is well written.
>> 
>> And of course it will be!

   (I pray for it to be)

>> I hope I have not misunderstood any of your ideas - let me know if I have.
>> Also, I'd like to hear more of these ideas - it sounds as if we are thinking
>> along similar lines!

  Well, people of similar occupation in the same world are often subject to
have the same ideas, all the more as these ideas come from the same serious
lack in their common environment.

>> I've been thinking about the more general sense as to what objects there would
>> be in this system.  At the very top, the 'object' class would provide a base
>> class for all objects (an imlpied ancestor when none is specified) which
>> would describe virtual functions for displaying, copying, writing to disk,
>> reading from disk, and so on.  It contains no attributes (data).  A direct
>> descendant from this would be the 'task' object, defining a program, its
>> starting point, and so on.  The 'task' object also defines how a program is
>> loaded.  Descendant from the 'task' would be the 'device' object, which is
>> used to deal with all devices, physical or logical.  For example, a disk
>> drive is a descendant from the 'device' object; and IDE disk drive is a
>> descendant of the disk drive object, and so on.  Included in this are
>> kerboards, video display, disk drives, mice, and more.

 I agree, but I'd like to add that the "virtual" adjective should not to me
mean "with reference to a function table called \"virtual\"", but actually,
that it means that the compiler (or the top level) be able to retrieve it
from compile info. The machine code may be totally different, most table
look-ups may be short-circuited, etc.

>> Object could also be shared among processes.  The code needs only to be loaded
>> into memory once, and can be accessed by all processes.  Object instances are
>> a little more difficult, and must be dealt with by defining a 'sharedObject'
>> class which overrides the constructor and deals with multiple accesses.  The
>> concept of persistent objects (ones which may be accessed several times by the
>> same program) also follow this example.  At any one time, many parts of a
>> program (or even separate tasks) may use the same object, pointing to the same
>> instance of the object in memory.  Each constructs (instanciates) and destructs
>> the object individually, but the object does not actually get destroyed until
>> nobody references it anymore.

 Here for example, for high level users, sharedObject will be another
realisation of the virtual Object class; it will include new features, etc.
>From high level to low-level, there will be a meta-list of the named objects
and their types.

>> Files will be a use for persistent objects.  Many tasks may use the same file
>> at the same time, usually only reading from the file, needing only one copy of
>> the file (or one file buffer) in memory at one time.  Files will also be
>> memory-mapped for convenient access, being able to return a pointer as if the
>> file existed in a continuous chunk of memory.  This does not need a series of
>> reads & writes, but is accessed and written to using only pointers.  It's very
>> easy to implement on the 386!

 That's ok for big files and big memory swapping, but what to do when many many
little ( <128 bytes ) objects begin to replicate in a program ? If you let them
multiply, you will be forced to swap whole pages just for a small part of code !
Thus, you must separate little objects and big objects and/or institue strict
page alignment for all objects AAARRGGGHHH !!!
 (But somehow, one cannot escape from this problem).

 A first thing to do is treat differently small and big memory allocation
(depends on the ratio memory used/memory taken if taken in 4Kb blocks). What we
may then do to lighten memory management is use sub-memory systems. 
 Each program can allocate memory in different systems. Their is a standard
virtual memory system type, highly interlinked in its implementation with the
ObjectZone class. The crucial difference between the two is that OZ take
objects as unit, the objects being possibly of different length, while memory
systems take byte (or word, or para, or block, etc) as unit. In fact, both are
complementary way to see the same phenomonon from different but necessary
points of view.

 Standard systems are available from the Kernel, to answer frequent use; the
more used the system, the more immediate to use; but standard system makers are
available at high-level (notice that I don't, never, put apart the programming
language and the system, as both are embedded; the language syntax as well as
the system exact executing sequence may change, but what is essential is the
link between both, and that one is never complete without the other; if one is
inefficient at doing something, so will be the other; their are two faces of
the same). Those makers allow one to make a mem.sys. inside another one, toe
meet the demand with the offer.

 On the 386 version, the problem about memory allocation & tasking is mainly for
the big number of very small inndependent items: how to manage quick allocation
and use of little bits of memory and still keeping anyone of them of killing not
logically linked objects. There is also this eternal question: should their be
a global garbage collecting (GC) system ? If yes, how to implement it ?

- a basic system for 4Kb blocks using virtual memory and possibly segmentation
- a basic system transforming a big segment into usable get&dispose memory
- a system to transform get&dispose memory into garbage collecting system
- ...

>> Well, that's a bunch of ideas...hope to hear your input as well, and further
>> explanation of your ideas.  Take care, and keep in touch!
>> 
>> 				Dennis
>>
				   ,
				Fare

(Well, good luck for MOOSE, and sorry for the delay between message)
(Perhaps we should synchronise;


P.S.:
- If we are to use code from standard DOS or U*ix compilers/assemblers, we neede
know the format of the output from theses programs (different .OBJ, .TPU, .DLL,
formats under DOS, ? under Un*x ? And what of the file systems used ?

- What we should do is quickly agree upon kernel features, ask for agreement
from every member of the project while precising actual coding.

- Our current working mode is too slow. We should have a ftp site, and a regular
mailer daemon. I'm too slow myself.