dynamic linking

Francois-Rene Rideau rideau@clipper
Wed, 28 Dec 94 4:05:56 MET


> How do you uncompile, how long will it take, how do you know where a
> migratable point is, how big is the source code vs binary?
   This is implementation dependent, of course, and users shouldn't have
or need to rely on it.
   For our first implementation, I'd suggest just keeping track of the
source, plus remembering useful information as to recover the object
state (e.g. "this integer variable is at such location"). Easy to do,
and not so inefficient. Doubles size of needed variables, but source
may be "swapped" out of physical memory, so this is no problem. Besides,
as you say, you should propose another working solution before to
criticize :). User-configurable decompile should also be available (e.g.
a "call this routine to decompile" variable for big objects where this
overhead is negligible).
   Binary size vs. source size is about 1:1 to 2:1 (but that's only my
estimates); you may double to quadruple this ratio if using compressed
source code; or you may divide it by two to four if the sources is
heavily annotated with proofs and/or efficiency measurement and
optimization advices. Moreover, for code that is seldom executed,
(uncompressed) source code may be just interpreted, so that ratio is 0:1.
Remember that we recommend using our LLL for such portable "source" code.
And again, unneeded source code/annotations may be swapped out to disk or
network or whatever, while binary code mostly lives in physical memory
and may mostly be forgotten and recompiled instead of stored or swapped.


> How do you keep track of who uses what?  For instance, is there a module
> that keeps a list of all data and who references it?  What is the
> granularity if so (integers, groups of data, all module data)?  If some
> data moves, do you have to recompile all the modules that access it?

   On whatever granularity you allow migration, such efficiency considerations
must be used. On a heavily parallel machine, you'll like finer granularity
than on isolated hosts. I recommend that modules contain enough info
(explicitly, or implicitly by allowing such info to be synthetized/decompiled
from the sources), so that efficient fine grain be achieved; but nobody
should force them, and else modules will be the basic system granularity.
   Then, according to information in those modules, the system will organize
it into migration-(more-or-less)-dependent grains, between which interactions
are known or measured. The system will then decide to migrate according to
measured or assumed information; it should be built so that only the migrated
modules would suffer from misassumed information. User-definable modules in
the system should contain the migration heuristics.
   As for our first implementation, I'd suggest just implementing
module-based granularity, with annotations in the modules to say what
links should be measured or what other efficiency is assumed for the
module. Efficiency annotations/measurements should typically include
how much each scope in the object affects imported resources, including
memory space and CPU time (being standard, automatically imported
abstractions).


> How do you segment an array?  Does my software segment or the OS/compiler?
   A compilation module should have done it (the user/a previous compiler may
let annotations available for a further compile cycle to do so). The execution
modules won't decide any such thing. It may only be useful when segmenting
overhead is cheaper plus parallelization overhead is cheaper than
parallelization gain (thus, only for objects shared in a parallel system).
Paging is also a cheap way to segment virtual memory into 4KB (or hardware
dependent -- 8KB on SparcStations) blocks. This may be done directly by
some lower module in the system; but it may still be good for the higher
modules to inline their array segmenting directly into such paging.


> If you moved a data object that is used by everyone, does that mean the
> entire system would re-compile?  (say a crucial kernel-like module changed)
   Of course ! How else can you achieve it ? Again, please do not criticize
when you don't have any alternative to propose :).
   But also of course, if the changed object's resource accesses weren't
inlined and already used through stubs (or intermediate objects), there may
be no need to recompile the stubs. The cost of a recompile is why you won't
change basic system modules every now-and-then (except during their
development, of course), but only when you need it, or when you expect some
non-negligible performance increase from a new version. Moreover, you may
not need to recompile everything just after the change, having some
monstruous lag; indeed, lazy-recompilation can happen so that objects
are actually recompiled only when needed, plus when the system is idle
(using some user-configurable prefetch/precompile module).


>>    Note that all this {,un,re}inlining stuff is costly enough so that
>> real-time threads should avoid them by locking all necessary resources.
> 
> Who manages the locking/unlocking?  (I know it's a module under your
> scheme)  But how does it work?  Assembly pseudo-code would be great to
> demonstrate your ideas!
   Every object manages its own locking, and you must explicitly call a
lcok-aware object and use its lock functions to achieve locking, if you
need it.
   Now, we'll provide *generic* lock managers, lock clients, and/or
lock-aware-ifying metafunctions (that automatically rewrite code so that
it be lock-aware). Moreover, locking is only needed when sharing
writeable objects; whereas what people usually need is getting an object's
current read-only value (e.g. when you compile a text source program, you
don't want to compile a file constituted of the beginning of an old version
followed by the desynchronized end of an updated version; you get the current
version, and use it whatever changes may occur before you finish).
   As for real-time procedures, they should be carefully written anyway, so
making all the locking explicit only make the code clearer, and eases bug
chasing. The one who installs real-time code (again, generic procedures
will be provided) should previously reserve non-gc, physical-memory-mapped
resources, by calling providers of suches (there will again be standard
ones; but not objects will have access to them). The one who uninstalls them
should free those resources.
   Locks will most probably be inlined to disable/enable interrupt instruction
pairs (or nothing, if the instructions are atomic), when the compiler detects
that the process does not loop, sleep, halt, or escapes during the locked
operations. For more complex operations, it will automatically allocate a
semaphore. associated with the variable.
   So really, I do not see what you mean by "assembly". Locks are some
abstract concept, not a low-level concept. An object that is modified but
restored before any access (e.g. an object swapped off physical memory)
may very well be considered having been locked. Locks mean something very
global about an object, while assembly is just local code.
   As for example pseudo-code, here is some generic real-time event manager,
with a normal part, and a real-time part. Don't focus on the syntax: it's
arbitrary, and we may agree to have any syntax; I just chose a syntax between
the one of the Coq proof system, that of the CAML language, that of LISP,
and that of C++.
   The rt- prefix stands for "real-time" and is conventionally used to denote
real-time aware objects; while "real-time" is a standard object that specifies
that the following object is real-time. I suggest that by default, mutable
variables should explicitly be called "escaping" (a prefix keyword) so that a
reference to it be able to escape current scope; this way, programmers would
have more control on mutable variables. Escaping variables may or may not be
lockable. In the latter case, accesses to this variable may be prevented by
use of a semaphore. All this is full of holes, that I invite you to fill
with your imagination.

------8<------8<------8<------8<------8<------8<------8<------8<------8<------
let Install-Event-Handler
   [E:Type]
   [S: <E>rt-Event-Source]
   [SEM: <E>rt-Event-Manager] =

  make-initialized-variables
     (* This is a variable-making macro,
        that'll invoke the proper method for each class
        to create a new initialized object at run-time,
        and assign it to properly typed variables (mutable objects),
        that at compile-time will be bound to the given identifiers *)

    rt-queue : real-time <E> FIFO
     (* this one will be reserved in the real-time (no-GC) heap *)

    queue : escaping lockable <E> FIFO
     (* This one is GC-able *)

    events-were-lost : real-time semaphore
    events-received : real-time semaphore
     (* a semaphore may be lit or waited for by threads.
        semaphore-reader and semaphore-writer are read-only and write-only
        views of semaphores *)

     (* we relied on the default initialized values for queues being empty,
        and for booleans being false; the user may further specify which
	are default values for real-time queue size, etc *)

 
    (* Indentation dynamically determines the scope;
       thus next line with same indentation as make-... indicates that what
       follows is no-more a variable declaration, but still in the scope of
       those declarations.
       When indentation is followed, no need for separators or terminators.
       But these are still accepted.
       If we wanted to have further objects in which the make-... variables
       were unbound, we'd have to indent less than the make-... statement,
       or to put the make-... statement in a {} isolator. *)

    (* Hey ! Who said I'd use a 100% standard syntax ? *)

    (* because of local scoping, a destructor is automatically generated,
       and *need* not be explicitly given (but still can) *)

  let Normal-part () =
    my-semaphore.require ()
    let e = rt-queue.pop () orelse
   (* all accesses are atomic, so this will automatically lock *)
   unlock(rt-queue)

  let real-time Real-time-part [e:E] =
    if (rt-queue.is-full)
      then
        events-were-lost.provide ()
      else
        if not (SEM e) then rt-queue.push e
          (* if a special event is found, treat it, else just queue it *)
        events-received.provide ()
          (* will activate the thread *)

  event-source.set-event-manager Real-time-part
    (* use the
  new thread Normal-part ()
  queue
    (* return the queue, as last object evaluated *)

    (* also should be done: 
      - a thread that catches the events-were-lost semaphore
      - allowing access to all the local variables to the special
	event handler. Some meta-functional macro should do that.
      - tell the system in some way that the handler is no more needed
        when the queue does not escape anymore. That is, the event-source
        does not *need* the handler, even though it has a pointer to it
        somewhere; so that whenever the queue is lost, the handler has no
        more reason to be. This is quite tricky; it means that some
        pointers express a necessity, and others do not... Tell me if
        you know simpler semantics that allow the same effect...
        This complex solution seems to mean that there must be some double
        mark & sweap GC somewhere, or a stop,mark,and copy variant of the
        stop-and-copy GC. Or perhaps we may allow such non-necessity
        pointers only outside the GC system, and let destructor functions
        handle all that stuff. After all, perhaps we do not need
        non-necessity pointers in everyday life. I'm not so sure.
     *)

------8<------8<------8<------8<------8<------8<------8<------8<------8<------


>>    As for RPC, we'll eventually provide a large span of RPC managers,
> I'm totally against RPC's.

   Hey ! But RPC is how distribution will actually be implemented. RPC
only means that an object evaluated on a system will actually be an
object of a remote system, and that you must use some abstraction
layer (the RPC -- remote procedure call) to do *as if* it was a local
object. The RPC is only a mechanism to do that. What does "no RPC"
mean ? I sure agree we do not want *explicit RPC* or even static RPC,
from the user (unless he really wants it); but there is still RPC in
some lower module of the system. Or are we using the same word for
different concepts ? I think we actually agree... So how do *you* call
that, when an object that actually sits in a physically remote system is
called by a local object ? Up to now, and until you give me a better
(and/or more standard, 'cause RPC is standard) name, I'm calling that RPC.


   To all your questions, I see no problems, only solutions. The only
problem I see is not find the solution, but choose between all those
solutions. I do propose generic solutions, and if you agree, *then* we
can agree on the specifics.