Non-linear continuations considered harmful

fare@tunes.org I+fare+WANT@tunes.NO.org.SPAM
03 Aug 1999 18:48:45 +0200


Fernando Mato Mira <matomira@iname.com> writes:

> Man, is RScheme really the one for me.
> If only native threads were there already..

The problem is that RScheme being Scheme, it has a lot of problems ahead
with threads, GC, and first-class continuations.


Ever since I learnt Scheme, I was seduced by its attempt at defining
a clean language that would do the "Right Thing(TM)",
yet be minimal, and trying to preserve the LISP way of providing
a one dynamic reflective environment.
There are still lots of things I dislike about Scheme
(guessing what I dislike about it is left as an exercise to the reader),
yet, until very recently, I thought it would be the best "Core Lisp" to use
while designing and implementing a distributed persistent system.
But as I tried to see how things could be actually done at the low-level,
I was very disappointed about Scheme, and decided that overall,
Scheme was not remotely the Right Thing, and that Worse is Worse.


Once upon a time, after hearing about tail-call optimization,
about LAMBDA being the ultimate imperative, etc, and about Scheme
requiring implementations to be "properly tail-recursive"
(whatever that means), I thought that indeed, Scheme allowed
to express inductive computation schemes cleanly and equivalently
as either iterative or (tail-)recursive idioms.

Consider following constructs:
  (let loop ()
    ...
    (if (not (zero? n))
      (begin
        (set! n (- n 1))
        (loop)))))

  (let loop ((n n))
    ...
    (if (not (zero? n))
      (loop (- n 1)))))

I _naively_ believed that the above loops (where ... did not modify n)
where essentially equivalent, the former being a "lower-level" version
of the latter, into which the latter would actually compile, with the
(loop) being a mere goto, etc. I sincerely believed that Scheme made
iteration and tail-recursion equivalent. How stupid I was!

Then, I tried to actually implement a Scheme compiler,
and I was struck by reality:
Scheme makes above looping constructs deeply different,
and it prevents the immediate reuse of stack/heap activation frames
with a simple goto, and this is all due to non-linear continuations
that may appear with call/cc: call/cc allows to capture mutable variables
that be "shared" among several instances of continuations.
As long as a continuation is linear (i.e. used at most once),
this doesn't make much difference, and indeed the two code snippets above
are equivalent; for variables are "shared" by only one continuation/thread,
so whether you modify an existing variable,
or create a new one and discard the old one,
the result is the same and the two techniques are undistinguishable,
so one may be transformed into the other, for the sake of optimization.
But as soon as you can reenter the continuation,
which you must assume is possible unless you can prove
that the continuation won't escape,
then havoc wreaks and you can see the difference:
mutated variables are shared among continuations,
whereas new variables that mask previous variables of same name
are not shared with other continuations.

I wrote a small program to demonstrate the difference,
and see if actual Scheme implementations did the Right Thing(TM)
about correct implementation of non-linear implementations.
Source code included below.
All Scheme implementations that were installed on my debian machine
(guile, MIT Scheme, gambc, elk, RScheme, scheme->c) correctly implemented
the Scheme specification in presence of non-linear continuations
(well, as far as RScheme and scheme->c are concerned, I did not manage
to get the compilers working, so I only tested the interpreters).
This means that Real Schemers(TM) (i.e. those having hacked an implementation)
are already well-aware of the issue.


Now, what are the implications of this semantic issue on implementation?
It means that activation frames cannot in general be reused,
and that "proper tail-recursion" is a much trickier concept
than naive people like me thought it was,
so that the kind of tail-call optimization that we naive people
think make tail-recursion the "same" as iteration
doesn't happen, or at least not in the simple way as we thought.

People grown with C, Pascal, and other Algol descendants,
or perhaps even grown with LISP and ML,
may think it natural that a called function, at least conceptually,
allocate an "activation frame" (on stack or on heap),
where it stores its arguments and local variables
(possibly merged from several enclosed LET bindings),
as well as a "return continuation",
made of a return address and the activation frame of the calling function.
We think it natural that (mutable) variables
be stored directly in these frames.
Well, that can't be, because frames may be entered
many times, and may also be captured at many different points of execution
(at every uncontroled function call), at times when different sets of
local labels have been activated or initialized.
Frames must thus only contain write-once data, and do not contain
mutable variables but only indirect pointers to variables allocated
and shared on heap.

Furthermore, capture of continuation consists either
in pointing to a frame, or in copying frames.
In the former case, frames must be completely read-only (never reused)
after being atomically created (i.e. w/o possibility of continuation capture),
which is quite tricky, since argument values need be accumulated somewhere
before to be sent together to the body of a lambda or a let;
this makes closure sharing practically impossible,
although it is possible to store local variables in callee-save registers
(which also makes continuation capture expensive);
frames being read-only means a new frame must be created at every single
uncontrolled call point so as to allow capture of a distinct continuation.
In the latter case of continuation capture as frame copy,
we must distinguish two notions of frames,
"active" frames (maybe on a stack rather than the heap)
and "passive" frames (on the heap). Capture consists in
making a copy of currently active frames into passive ones.
Active frames are recyclable during tail-recursion
(which can only be done either if target frame
has same size as source frame, or if a stack is used),
whereas passive frames are strictly read-only.
Constantly creating new frames is so slow that people do prefer
copying frames, especially since continuations are seldom captured,
at least in "usual" programming style;
however, the former method does provide bounded-resource ("real-time")
guarantees, and some implementations get the "best of both worlds"
by having a local stack cache of active frames
that eventually gets flushed into passive heap frames
before getting too long.

Having a "local cache" to accumulate values,
be it a stack and/or a body of callee-save registers, is a necessity:
according to the RnRS specification, multiple argument values to a lambda
(or a let) must be thus accumulated in an implementation-defined order,
before to be sent to the continuation;
now, if a continuation is captured in the middle of argument computation,
the remaining argument values must not be shared among multiple restarts
of the continuation, which means either "passive" frames a generated
at a monstruous rate, or we have at least one "active" frame.
Callee-save registers can be assimilated as a lazy variant
of the "save activation frame at every call site" technique,
for part of the frame's data.
Again, mutable variables cannot be stored directly in such registers,
least it be proven that they won't be captured in a non-linear continuation.
Another catch with capturing a frame in the middle of initializations
is that when recycling a frame, local variables from previous frame should
be reset to non-memory-consuming values (or popped off, if it's a stack)
before any possibility of capture, least they leak memory forever


All in all, there is much less freedom of implementation
than could naively be thought of. With so little freedom,
Scheme _as a dynamic programming system_ must be considered
more of a low-level language than of a high-level language;
just a low-level language for some virtual machine
much more elaborate (and by many aspects more pleasing) than C or assembly.

For instance, although we could naively think that the imperative set!
style was lower-level than the pure tail-recurse-with-argument,
it so happens that the latter is intrinsically faster to implement,
since it only involves messing with the current activation frame and jumping,
whereas the former involves not only activation frame handling,
but also modifying data on heap with a read/write barrier.
There is an abstraction inversion,
whereby what is easier at the low-level is made harder at the Scheme level,
whereas what is harder at the low-level is mader easier at the Scheme level.
This accounts for an incompressible performance hit
when implementing Scheme in a dynamic environment,
as compared to implementing Common LISP, or other languages.

The pervasive possibility of non-linear continuations through call/cc
makes every use of mutable variables much more costly,
and prevents many program transformations and according optimizations,
including various forms of lambda-lifting.
Since non-linear call/cc makes life so hard for mutable variables,
that must be boxed out of the stack, it could have been expected
that a language with call/cc would explicitly separate the concepts
of variable binding and of mutable reference, as does SML/NJ;
in this respect, this lack of orthogonality can be considered
a legacy mistake in the Scheme design that was not done in ML.

Because there is no way to declare special properties ofa programs,
Scheme forces the implementation to always pay the price
for the rare general case.
This makes it very important for optimizing implementations
to perform as good analyses as possible for variable mutability,
object escape, object linearity, etc.
Foremost, escaping continuations must be identified.
Linear continuations and linear objects must be identified
(an object can only be linear if so is the continuation that uses it).
Mutable variables can be optimized on stack
in presence of linear continuation.
Now, proving that a continuation will stay linear is difficult:
every use of a functions stored in shared mutable bindings,
especially global bindings, and including standard predefined functions,
is a "contamination" point where a continuation may be captured.
Contamination may only be avoided by "protecting" use of the function
with a binding of unescaping variables with early values of the functions,
assuming we can trust "initial" values of such functions.
Higher-order functions that execute functions passed in arguments
or in data-structures will also have a lot of trouble to guarantee
linearity of its execution.
All in all, optimizing code written in a "normal" style is impossible
in presence of separate compilation, incremental compilation,
dynamic EVALuation, or otherwise capture of often-used global bindings.
In absence of a way to explicitly declare properties of
type, mutability, linearity, escape, etc,
only a STAtic Language ImplementatioN may prove that continuations
will stay linear and optimize Scheme in a correct way.


It is argued that call/cc can be used to implementing
exceptions (catch and throw) and threads,
but these would have been more than satisfied with linear continuations
(also, space-safe threads also want a way to reset the cc to null).

The case for threads can even be developed:
in presence of preemptive multitasking and/or of real multiprocessing
(didn't Fernando long for the use of "native threads"?),
how will sharing of variables happen?
Can a foreign thread non-linearly capture
the continuation of another thread at any point,
breaking any attempt to optimize code in presence of linear continuation?
Must variable accesses be all done on the heap,
with locking done at every memory access,
and refetching from memory done at every variable reference
(C "volatile" variables)?
The Scheme way of dodging complexity would be that we indeed
pay such a worst-case price everywhere through the program.

It seems to me that we _must_ acknowledge continuations
as being essentially linear objects, although it might be possible
to explicitly duplicate a captured linear continuation.
Sharing and non-sharing are essential concepts in concurrent languages,
and should be revealed to the user,
instead of being mismanaged by an ignorant implementation,
especially in presence of dynamic computation.
Knowing which mutable object is "owned" by which thread
and must be copied by value together with the thread,
and which object is actually interfaced and should be copied by reference,
is an essential point of program design that cannot be feasably decided
a posteriori.

How do existing "parallel/concurrent" dialects of Scheme and LISP do it?
Does there exist a system with linear typing for concurrent programming
(no, I don't mean Clean, unless it allows for communication)?
I know of works by H.G.Baker on adding linearity analyses to LISP compilers;
are there works on adding linearity _declarations_ to LISP dialects?

[ "Faré" | VN: Уng-Vû Bân | Join the TUNES project!   http://www.tunes.org/  ]
[ FR: François-René Rideau | TUNES is a Useful, Nevertheless Expedient System ]
[ Reflection&Cybernethics  | Project for  a Free Reflective  Computing System ]
No man is an Iland, intire of it selfe; every man is a peece of the
Continent, a part of the maine; if a Clod bee washed away by the Sea,
Europe is the lesse, as well as if a Promontorie were, as well as if
a Mannor of thy friends or of thine owne were; any mans death diminishes
me, because I am involved in Mankinde; And therefore never send to know
for whom the bell tolls; It tolls for thee.
		-- John Donne, "No Man is an Iland"
------>8------>8------>8------>8------>8------>8------>8------>8------>8------
(define call/cc call-with-current-continuation)

(define (new-counter)
  (let ((count 0))
    (lambda ()
      (set! count (+ 1 count))
      count)))

(define count (new-counter))

(define (show . l)
  (for-each display l)
  (newline))

(define kk #f)
(define block? #f)

(define (e)
  (show "escape #" (count))
  (call/cc
    (lambda (k)
      (if (not kk) (set! kk k)))))

(define (init)
  (set! kk #f)
  (set! block? #f))

(define (l1 e n)
  (let loop ()
    (e)
    (show "count down: " n)
    (if (not (zero? n))
      (begin
	(set! n (- n 1))
	(loop)))))

(define (l2 e n)
  (let loop ((n n))
    (e)
    (show "count down: " n)
    (if (not (zero? n))
      (loop (- n 1)))))

(define (f l n)
  (init)
  (l e n))

(define (f1 n) (f l1 n))
(define (f2 n) (f l2 n))

(define (blockk)
  (if block?
    (display "kontinuation blocked\n")
    (let ((k kk))
      (set! kk #f)
      (set! block? #t)
      (k #f))))

(display "\nside-effected variable, top-level continuation\n")
(f1 3)
(blockk)

(display "\npure recursion, top-level continuation\n")
(f2 3)
(blockk)

(display "\nside-effected variable, normal continuation\n")
(begin
  (f1 3)
  (blockk))

(display "\npure recursion variable, normal continuation\n")
(begin
  (f2 3)
  (blockk))