POSITION STATEMENT: Specifications for a New Lisp Dialect (ARGOT

Alaric B. Williams alaric@abwillms.demon.co.uk
Fri, 16 May 1997 20:27:22 +0000


Here a few specifications for what I would want of a new Lisp dialect.
They do not cover everything, and may have some inconsistencies, to
boot!

PHILOSOPHY

Look to future extensions when defining the behaviours of functions;
be careful about what is "undefined". EG, a "subtype?" primitive
is required by spec V1.0 to answer a certain set of questions
correctly, but may return "don't know" for other cases. However,
this behaviour is not to be relied upon; a V1.0 program may
not rely on any particular behaviour. Future versions of the
standard may make use of advances in type science to nail
down some of those dodgy cases. ETC. This is a bad example,
but it's all I can think of right now :-)

LAYERING

The dialect should be layered, like EuLisp. Layer 0 should be
/very/ basic, the irreducible set of primitives (well,
irreducible without starting to sacrifice efficiency in
a big way; we won't just have a Turing machine in there)
from which everything else is derived.

All layers above 0 should be given as source code referring
to the layers beneath as part of the spec. THen we have
a definitive implementation. Smart hardware-using implementations
of layers 1+ must behave like the definitive implementation unless
the behaviour is marked "undefined".

LAYER 0

/data model/

A "value" is a linear type. It comes from one place and
goes to one place.

All functions return linear "values", and all function
arguments are given as values.

A value may be captured in a val-binding, where it is
given a name; it is semantically equivelant to the
argument names in a function body. It /must/ be
used /once/. This is statically checkable.

A definition is the other kind of binding, called
a def-binding for consistency. It is initialised
once, at it's definition, and can then be /read/
arbitarily. It's immutable.

If a function wants to arbitarily read something it gets 
as a value, then it must destroy that value in a
def-binding definition, where it is used to 
initialise the definition, which can be read arbitarily.
This works for the returned values of functions,
literal (quoted) values, and val-bindings such as
argument names.

A symbol val-bound evaluates to the value. A symbol
def-bound evaluates to a value cloned from the def-binding.

Mutation is not in level 0. It comes later.

As well as lists, there is a special type called a record, designed
to efficiently capture the essence of an alist. Each "cell" of a record
has a label, a head, and a tail. The tail must point to another record
cell or #[], the empty record value.

The standard syntax for a record is:

[ <label> : <value>  <label> : <value> ]

Lists are as you would expect.

/defined values/

#t        - logical truth. Anything apart from #f counts as #t.
#f        - logical falsehood (statistics? :-)
#?        - the unknown

#nil      - nothing

#()       - the empty list
#[]       - the empty record

#effect   - a side effect. Returned by functions like "print!" that
            exist only for side effect.

/syntax/

(if <condition> <this> <that>)

(lambda <flags> (<args>) <body>...)

Where flags are:

 "mutator" - it should be known that the function
             modifies global state. A smart compiler
             may detect violations of this, but
             not necessarily.

 "stateful" - the function may read global state;
             ie, say bye bye to referential transparency.

An arg list has two components, all of which are
optional, and those that are present are 
seperated by either the "#rest" or "#keywords" symbol.
If a #rest is present, there may be no #keywords.

EG, valid sequences would be:

(...) - all positional
(#keywords ...) - all keyword args
(#rest ...)
(... #rest ...)
(... #keywords ...)

The first part specifies positional parameters. If
an element is a symbol, that is the symbol val-bound
to the incoming value.

If it's a list, it's first element is the binding symbol,
and the second (and last) is a record of flags.

lazy : ([mutator] [stateful]) - the value is given, instead, as a function
of no arguments, that contains the un-evaluated argument, with the
flags given in the list. 

This is used for defining some types of syntax without needing
macros.

type : <type> - the value given to the function must be of
a specified interface type; violations of this must be either
statically detected at compile time, or thrown as exceptions
at run time. More on types later.

The meaning of any other labels is undefined.

The rest part may contain only one binding, which may be lazy
or typed as before, and is bound to a list of the given type
of value - if lazy is used, the list isn't lazy, but it's 
a list of functions of no arguments. Likewise, a type specification
is applied to each element, NOT the list.

The keyword part is keyword args. Each member of this bit is a
list of two or three arguments; the first is the val-binding
symbol to use in the function, the second is the keyword 
symbol (/without/ : pre/suffixes; see below). The optional
third is a record of flags, as before.

Inside the body of the function, there is an implicit continuation
val-bound parameter, which goes by the name of "continuation".
See the continuations section.

(<closure> <args>...)             

This is a function invocation. If the closure uses #rest, then
the first arguments are bound to the defined positional arguments;
there must be enough. Any further arguments are placed into a list
and passed as the last argument. Non-lazy arguments are evaluated
before the call; lazy arguments are not evaluated. Instead, they 
are wrapped in (lambda [flags] () ...). 

If the closure uses #keywords, then the required number of
positional parameters must be present, followed by a single
record, which associates labels with values.

The record may be ommitted, in which case #[] is assumed.

/types/

There are two kinds of types; interface types and implementation
types.

By convention, the names of interface types have *s around them,
and implementation types have +s.

An interface type is something like *number*, or even *type*,
or *interface* or *implementation*. Formally, the specification
of an interface type is a decleration of a set of methods the
type must supply. The programmer may define their own interface
types with a construct like:

(make-interface
       (<generic-function-name> <function-interface-type>)
        ...
)

EG,

(define *person* (make-interface
          (name (() *string*))
          (age* (stateful () *integer*))
)

A function interface type is fully specified with a function
returning an interface type; the above usage is an abbreviation of
the full syntax, which is like:

([mutator] [stateful] (<arglist>) <return-type>)

There is a root type *all*, of which all are subtypes.

Other interface types defined at level 0 include:

(*ranged-integer* <min> <max>)
(*modulo-integer* <modulus>)
*boolean*
*symbol*
(*member* <possible values...>)
*list*
*record*
(*matrix* <member-type> (<dimensions>))   - functional 
                                           (immutable) array
*character*

An implementation type specifies the nitty-gritty. They
are only really used (in good programming style) in
the methods defined on the actual implementation types,
to make them conform to an interface type.

The system provides implementation types, but you don't
specify them yourself. It chooses from +fixnum:16+ etc.
as it sees fit.

The standard way of defining one's own implementation type
is with a "make-class", which returns an implementation of
a class. A CLOS-like system applies. Generic functions may be
defined, methods added, etc...

/macros/

Something like Scheme's system? Research ongoing. Suggestions
WELCOME!

/modules/

All global bindings are available as <symbol> or
<module-name>::<symbol>. All the bindings in layers 0 and up are
grouped into suitable modules titles. 

Module importation is like so:

(require-package <package-name> <module-name>))

The names are symbols. This requests a module from a package
to be made available (the require may be at any point in
the code, even the end; it is looked for in an initial pass).

The layers are defined as standard packages, each split into
individual modules defining interface types and generic 
functions/methods, or macros, or whatever, except for layer 0,
which is so basic it all goes in one module, "all".

EG:

(require-package 'ARGOT-layer-0 'all)

Other types of "require" will be defined for my ARGON project,
which uses modules as part of it's persistence paradigm.

/continuations/

This is experimental.

A continuation is a list of "call frames"; the head of the list
is the continuation of the current function call, the next element is the
continuation of the function calling it, etc.

Each call frame is a list of one or more elements. The first is always
a continuation. For a normal call, that's it, but a special syntax:

(call (<closure> <args>...) <other>...)

produces a call with the "other..." as the remainder of the
call frame.

This allows the creation of dynamic bindings, and exciting control
structures, by leaving markers in the call frame (such as exception
catch requests) which functions lower down (like throw) can go
and look for.

Dynamic binding can be done by putting suitable markers in the
stack. A (dynamic-get <symbol>) function can then scan up the stack
for the first (innermost) match.

A special note on function types:

Binding a mutating or stateful function to a bind that
is typed as a function, but without enough flags, is an error.

IE, a flagless (pure) function is a subtype of a mutating
function, but not vice versa.

An untyped (*all* assumed) binding can take any kind of function,
EXCEPT if a function call is made from that binding, in which case
the compiler should sneakily insert a weak assertion that it
is neither stateful nor mutating.

IE, at the call site of an impure function, the compiler MUST have
been given the information that the function is impure.

This allows it to optimise more freely, and keeps the programmer thinking
about what kinds of state he is playing with.

A function defining keyword arguments can enter a binding typed for a function
without any keyword arguments as long as the other criteria are met, or
for one defining /less/ keyword arguments (subset); however, those keywords
won't be available from that binding. Likewise, #rest arguments can be
"masked off" like this. This allows functions to have more functionality
than they need to to fit a specification.

/let/

There is a nice let construct, with two flavours.

letseq is like let*
letpar is like letrec

(Do I really need a plain let?)

The syntax is:

(let[seq|par] [<name>] <bindings> <body>)

The let implicitly unwind-protects any val-bindings
made in it, to preserve linear types. Functions do
this for their arguments, too.

The <name> is for recursion - a named let.

The bindings come in a record, like so:

[ def : (<symbol> <value>) val : (<symbol> <value>) ... ]

There may be arbitary numbers of defs and vals, evaluated in
order for letseq, or "in parallel" (perhaps literally so)
for letpar, which allows mutual recursion if functions are
bound to names.

/quoting/

There is quoting and quasiquoting, as per Scheme.

LAYER 1

Layer one defines higher-level stuff. Lots of nice control structures;
exceptions and so on, cond, case, etc. 

(require-package 'ARGOT-layer-1 'types)

The full specifications of the interfaces of types are available
at layer 1, such as "sybtype?" and "type?", "typecase", etc.

Note that there is still no mutation.

I had a few more ideas for layer 1, and forgot them :-)

Oh, yes, the provision of a "future":

(require-package 'ARGOT-layer-1 'future)

(make-future <expression>)

This returns an argument-less function, which when evaluated, evaluates the
given expression. However, the expression may be eagerly evaluated by
a parallel processor, for example, before being invoked. Once performed,
it is memoized - hang on, that's a state change; this should be in LAYER 3,
above state changes. Whoops. Well, to cut a long story short, calling
it twice won't evaluate it twice; it'll just return the same value, again.

The expression is, of course, a lazy argument typed as neither mutating
nor stateful.

LAYER 2

Here we provide mutation.

(require-package 'ARGOT-layer-2 'impurity)

An interface type is defined:

(*variable* <type>)

A variable is considered to be a reference into
a mutable heap. IE, the variable is still immutable, so 
it can act as a value; but it "points" to something.

*variable* provides the following methods:

(set! <variable> <value>) - sets the value.
(get* <variable>)

Style convention:

! means the function is a mutator, and possibly stateful, too.
* means it's stateful.

*variable* is merely a special case of the more general
*array*, which is a mutable *matrix*; however, it
replaces the matrix's (get ...) with (get* ...),
so is not a subtype as such.

Matrices can initialise from the instantenous state of arrays,
and vice versa.

Now we reify the ARGOT virtual machine.

(require-package 'ARGOT-layer-2 'argot)

A place where code is executable and executing - a totally
active VM - is of interface type *vm*.

(eval! <vm> <expression>) - returns the value of the expression,
                            which may be stateful, mutating, you name it.

A *proto-vm* is not yet ready for executing code, since it's still
being "built".

(make-layer-0-proto-vm)

returns a new *proto-vm* containing the layer 0 language only.

(bind <proto-vm> <package-name> <module-name> <bindings>)

returns a new proto-vm with the given record full of global
def-bindings (the bindings record is <symbol> : <value> etc),
registered under the given module name. 

If a module does not have a require-package for those bindings,
they are not used. The construct above merely registers the
module as 'available'. If the code requests a module that is
not 'available' like so, it can try to find it externally,
in an implementation-dependent manner.

(insert <proto-vm> <package-name> <module-name> <module>)

Inserts a module into a proto-vm. The module source is
of one of two forms:

1) Boring. It is a list of define statements, creating top-level
def-bindings, and some kind of macro-defining construct.

2) Interesting. The native format for ARGON, my OS project.
An Interesting module source consists of an ARGON "document"
object with displays of type "ARGOT-definition" in them.
Anything outside these "definition" bits is ignored; the
definitions themselves are compiled.

When a proto-vm is full of the correct modules (cyclic dependencies
are possible; the proto-vm could be implemented so as not to compile
until it is "frozen" into a vm, so that all modules should be
reachable when needed), it is frozen, to create an executable
vm. The freezing process may be implemented as compilation to antive
code.

(compile <proto-vm>) - returns a *vm*

LAYER 3

Here is where the future should go. Sorry!

COMMENTS WELCOME!

ABW
--
Alaric B. Williams (alaric@abwillms.demon.co.uk)

   ---<## OpenDOS FAQ ##>---

Plain HTML: http://www.delorie.com/opendos/faq/
            http://www.deltasoft.com/faq.html

Fancy HTML: http://www.deltasoft.com/faq0000.html