Language standards for LispOS

Marc Wachowitz mw@ipx2.rz.uni-mannheim.de
Wed, 21 May 97 20:11:56 +0200


hbaker@netcom.com (Henry G. Baker) wrote:
> If certain CL functions _should_ be returning immutable
> conses, and if some program depends upon the fact that they don't,
> then of course it should break.

Sure, that's exactly what I was suggesting. However, if we would make
CONS-IMMUTABLE a supertype of CONS, we would break any program which
assumes that what it gets is a CONS, even if it does only read it. Thus
we need either a subtype, or make mutability independent from typing.
Since we may want to specify non-homogenous nested mutability (e.g. a
mutable list of immutable strings, or an immutable vector of immutable
lists of mutable structure instances), I suggest it's more useful to
combine this with the existing type system. We might also define a
macro which creates an immutable type from an existing mutable type
(such subtypes should already be predefined for the standard types).

> Presumably the 'type' functions like CONSP continue to work, but we will
> also need functions like MUTABLEP to determine 'use'.

My proposal is to do this within the existing typing framework:

For some interesting type FOO, there exist subtypes FOO-MUTABLE and
FOO-IMMUTABLE, with the obvious semantics. To allow separate testing
of mutability, make FOO-[IM]MUTABLE subtypes of the abstract types
[IM]MUTABLE, respectively, with ([IM]MUTABLEP X) as the usual short
forms for (TYPEP X '[IM]MUTABLE).

Standard programs, which just work e.g. with CONSP or declare something
to be of type CONS, don't indicate whether the actual type is mutable;
programs knowing the extension could declare whether something is or
isn't mutable, or allocate their own immutable data with a more
specific type. This yields the required semantics without breaking
any standard code.

> Linear variables must have exactly one use, and this must be
> statically obvious.

I'd suggest introducing ordinary functions (and additional macros) for
the relevant manipulations, and then make the readtable expand a much
shorter notation into the long form, maybe somewhat like the following;
this keeps existing code walkers happy.

(LET ((#1=#:GENSYM10 (LISPOS:LINEAR-VARIABLE)))
  (DECLARE (DYNAMIC-EXTENT #1#))
  ...
  (LISPOS:GET-LINEAR-VALUE #1#)
  ...)

Since $ is classified as alphabetic character, which may occur in symbol
names (and the pound sign isn't in the standard character set - or even
on my keyboard), we might use either ~ (the tilde), which is explicitly
reserved to implementors, or introduce yet another sharp-sign reader
macro, e.g. #_X (not so good, since it's more likely to be in conflict
with existing code). ~ reminds somewhat of the fact that such values
"flow through" bindings, rather than sticking to them ;-)

Of course, before worrying too much about the syntax, we should better
define clear semantics - including interactions with various other
features of Common Lisp, like closures, non-local exits, and whatever
we define for threads.

> I would make the default type of argument lists
> to be _linear_, so &REST arguments that aren't marked with '$' are
> coerced to immutable lists.

Since we're introducing non-standard behaviour here, I'd rather use
a new lambda list keyword for that purpose, e.g. LISPOS:&LINEAR for a
linear rest variable, and LISPOS:&IMMUTABLE for a non-linear, but still
immutable, rest variable. That's permitted by the standard, and there's
no need to break existing code expecting a normal list from &REST.

In practice, you'll probably not find many cases of rest-mutation, but
it is allowed, as long as there's no problem due to sharing with the
last argument of APPLY. Since many typical uses are obvious for the
compiler (assuming some knowledge about functions on lists, and some
modest inlining, and perhaps teaching it some common idioms), I doubt
it's really worth making false assumptions about standard code. However,
if we want the most efficient access, and extend the standard anyway,
we might as well do something better:

LISPOS:&SEQUENCE would only promise an immutable sequence without any
required identity (same for any subsequence), and the compiler would
be free to optimize it, even differently for different access points.
For a lot of programs, it may be most efficient not to reify such a
"container" at all, but keep passing data around as several arguments
(probably in registers or on the stack) in the usual way. Few argument
lists are very long, as far as I can tell, and the compiler can usually
figure out what fits its code generation strategies best (if all else
fails, it can always fall back on allocating an ordinary lists; this
shouldn't happen in most cases, unless the user really wants to use a
true list, in which case we don't win anything by avoiding it).

-- Marc Wachowitz <mw@ipx2.rz.uni-mannheim.de>