Joy, Lambdas, Combinators, Procedures

iepos@tunes.org iepos@tunes.org
Wed, 2 Feb 2000 15:56:05 -0800 (PST)


> Let me make clear my bias against a purely applicative language with a
> single disgusting analogy: I would no more consider using a purely
> applicative language for the sake of simplicity than I would consider using
> castration for the sake of moral purity.  I consider both goals to be noble
> (in their place), but I consider the means to be misguided.
> 
> All those features are in there for a reason: the reason is generally to
> make real programs simpler.

I agree that a "simplification" to a language's syntax is of little
use if it is artificial, making the language more difficult to
interpret. An example of this kind of simplification might be
helpful:

It has long been known that a system that uses many primitives
can be reformalized as a system that uses a single primitive;
for instance, consider a purely applicative system of arithmetic
that has the primitives "1", "+", and "-". An example
expression of the system might be:

  + 1 (- 1 1)

which would denote "1". Now, the system could be reformalized
using a single primitive, say "Z", such that

  ZZ = 1
  Z(ZZ) = +
  Z(Z(ZZ) = -

Essentialy, this "Z", given a primitive (or a thing that was
a primitive in the original system), yields the next primitive.
The expression "+ 1 (- 1 1)" in the reformalized system would
be rewritten:

  Z(ZZ) (ZZ) (Z(Z(ZZ)) (ZZ) (ZZ))

This is a kind of "simplification" which I don't find to
be too impressive; although it is possible to eliminate
a lot of primitives, the new primitive "Z" is much more
intuitively complex than any of the original primitives.
A proof system based on the langauge using "Z" would probably
not be any better than a similar proof system based on
the original language, and would probably even be worse.

Now, my view is that the simplification of a language to a
purely applicative one is not like the simplification
in this last example. I think that reducing a system
to a purely applicative one may be truely useful (some
reasons will be given at the end of the post). 

Still, it is not my view that non-purely-applicative systems
are innately inferior... there may be other well-designed
approaches that are also interesting (such as Joy's approach).
My point is only that purely applicative languages are
worth considering seriously; actually, the best way I
know of to found a system is with a purely applicative language.

> > > > In case this was not clear, by "purely applicative 
> > > > language", I mean
> > > > a language in which all expressions are built from a finite set
> > > > of primitives using a binary construct ("function application").
> > > > An expression in such a language is most naturally written
> > > > using a binary tree, with primitives as leafs.
> 
> Your purely applicative language has MANY other constructs, ranging from
> combinators to parenthesis to definitions.  So your language doesn't fit
> your reading of that definition either.

I suppose you are referring to the informal language I used
in my posts; strictly, this language is not purely applicative,
but is a sugared variant of a purely applicative system.
The specific "other constructs" (combinators, parentheses, definitions) 
that you mentioned keep the language from being purely applicative
are worth saying a few things about:

First of all, expressions that denote combinators (such as
"B", "C", "W", and "K") are primitives in the language.
Recall in my definition of "purely applicative" that
primitives (meaning atomic expressions that are assigned 
precise meanings) are allowed, as long as there are only a 
finite number of them. In fact, primitives are essential
in a purely applicative system (Joy also relies heavily
on primitives (like "dup", "swap", "dip", "map", etc.), rather than fancy
constructs; this is in contrast to languages like C and even
Haskell, which rely on myriads of fancy constructs).

Next, parentheses... Parentheses, as they are used in my informal
language, do indeed keep the system from being technically 
purely applicative. There are several ways to replace them;
one way is to require that the application of "f" to "x"
always be written as "(fx)", with omission of parentheses
never allowed; this approach still uses parentheses,
but in such a way that the binary construct of the language
(representing function application) is a precise textual operation
(specifically, concatenating "(" with the function expression,
parameter expression, and ")").

There is another way to eliminate parentheses which is much
better. Simply write the application of "f" to "x" as "@fx",
prefixing "@" to the function expression and parameter
expression. Of course, any atomic expression would work
in place of "@" (we chose "@" because it suggests "application").

In a system with primitives "B", "C", "W", and "K",
the expression that would be written with parentheses
as "B(BW)(BC(BB))" would instead be written:

  @@B@BW@@BC@BB

Noting that the rewritten version is not any larger than the
original, this kind of notation may be of practical use
not only in formal languages of systems, but in human-readable
languages as well. Note that in this notation, "@@fxy"
is the binary application (in a curried sense) of "f" to "x" and "y";
if this kind of notation is used in human readable systems,
shorthands for "@@" and "@@@" would be nice. 

Third, definitions... Actually, I don't recall using any sort
of definition construct in my language, but it is something
that will be needed eventually... In any case, I don't
think a special construct is needed for definitions.
In some systems, it may be possible simply to use an equality 
primitive (a binary function yielding a proposition) instead.
In my toy system, I plan on using a little different approach:
a primitive "def" will be provided, which is a function
that takes two parameters (a name to define and a thing to
assign it to denote), yielding a procedure instructing
the system to modify the system with the definition
(and by "procedure", I still mean a function onto a
procedure completion, so that procedures can be sequenced
with composition). Also, another primitive "undef" will
be used to undefine terms... There may be other primitives
for inspecting the system's namespace; these specific
procedures actually will probably not be taken as primitives...
probably rather I'll have a "dict" primitive (a field storing
the current dictionary) which can be inspected and modified
with ordinary field primitives. 

Anyway, definitions, in a way, are not really that a fundamental
part of a system (at least, not the system I'm working on),
although they are quite a visible part. Actually, 
a finished system would of course not have just one static 
system dictionary; each user of the system at least would 
have their own. The whole issue of how the dictionaries
(and thus definitions) will be handled will be taken care of by
the system's shell(s), which interact with the users, and should
(in the end) be quite flexible. 

Anyway, that takes care of those three obstacles... 

> > Once again, a purely applicative system (in the sense that I meant)
> > does not use names for its parameters. Purely applicative
> > systems instead tend to use combinators (primitive functions such 
> > as "B", which composes a function, and "C", which swaps its parameter 
> > order, and others) to form functions. With a handful of
> > such primitive functions, it is possible to form any function
> > that could have been formed with a lambda (named variable) construct.
> 
> This is true.  If you'd played with J or APL you'd realize that it quickly
> becomes FAR to complicated to track parameters that way -- but it's still
> possible.

I think this may be a good point. I admit that I've done very
little practical work with systems that use combinators to
form functions (rather than lambdas). But I'm hopeful...

> > > In addition, because function invocation is complicated by function
> > > application, proofs written in the language have to deal 
> > > with the double action.
> 
> > The system you're describing (C, I believe) is nothing like a
> > purely applicative system. 
> 
> It's precisely and exactly (in this quote at least) purely applicative.
> Note that I said nothing about names above -- the simple fact is that in an
> applicative language, function invocation is always accompanied by function
> application; in other words, the language can't emit code to run the
> function until it has the paramaters which will be passed to the function.

I'm really not sure what you mean by "function invocation" as
distinguished from "function application", and I don't know
what you mean by "run the function"...

By "function", I usually mean an abstract mapping from things to other
things. The things that will be "ran" in the system will be
procedures, not generally functions.

> (Named variables actually don't eliminate applicative purity; they eliminate
> combinatoral purity, though, and I'm quite willing to agree that your
> language design can eliminate named variables.)

Hmm... named variables do destroy applicative purity, at least if
they are accompanied by a lambda construct to bind them. Anyhow,
I think you agree that whether they destroy applicative purity or not
they greatly clutter programs and make proofs about the system
much more difficult.

> > > > Anyway, one clear benefit of using a purely applicative language
> > > > is that it has such an amazingly simple syntax.
> 
> > > I strongly suspect that you don't have a referent for 
> > > "simple syntax".
> > > Currying (the simplest known way to define an applicative 
> > > language) is
> > > indeed simpler than some of the complex parsing tricks some 
> > > languages have
> > > to do; however, currying is not enough by itself.  You also 
> > > have to have
> > > four other syntactic elements: one to delimit functions, 
> > > one to define
> > > names, one to count how many parameters a function has (and 
> > > identify each
> > > one), and one to change the curry order.  That's a total of 
> > > five elements.
> 
> > Again, this is based on a misunderstanding... A purely applicative
> > system does not use named parameters, and does have a truely
> > simple syntax.
> 
> Read what I wrote -- I didn't mention named parameters.  I listed five
> essential elements of application; with all five of those you don't have a
> language, instead you have a spreadsheet macro (or something).

I'm not really sure what you mean by each of those syntactic elements
and certainly don't see that they are required in all purely
applicative systems. I've given the syntax of a small purely applicative
language above, with these being some expressions in it:

  @BW
  @@BCC
  @B@WB
  K
  @@B@BW@@BC@BB

How are your 5 syntactic elements manifest in this language?

> Your language, by the way, adds another element which isn't listed in my
> "must have"s: combinators.  Combinators seem to be special to your language,
> like IMMEDIATE words are to Forth.  That definitely adds another element to
> your language -- an element which Joy does not have.  (Joy includes
> functions which act as combinators, but they do so without any special
> language support.)

Combinators in my purely applicative language are treated as ordinary
primitives; they do not have special language support any more
than Joy has special support for "dip", "swap", "dup", and "pop".

> > In a Joy test library, I saw this definition of "factorial":
> 
> >   [ [pop 0 =] [pop succ] [[dup pred] dip i *] ifte ] y
> 
> Good choice.  (That's an ugly way to do it and I wouldn't dream of writing
> real code that way, but it uses the primitives so it's easy to translate and
> is somewhat non-trivial.)  Note that a harder challenge might have been
> appropriate for you: in Joy it's trivial to define new combinators; in fact,
> there's no distinction between a user-defined combinator and a user-defined
> function.  Is it equally easy in your language?

Yes, it will be just as easy, provided some syntactic sugar is
permitted in the language. Interestingly, Joy has a special
construct for definitions, which could have been avoided...
For example, one might define "popd" as "[dip] pop" in Joy by:

  popd == [dip] pop

Yet this could have achieved without treating "==" specially,
if they had added a "define" primitive, in this sort of way:

  "popd" [[dip] pop] define

"define" could be treated as an atomic program (that has an effect
other than on the stack, like "put" and "get") that modifies
the interpreter's namespace... Of course, one would still like
to have "==" as sugar. This is the approach I plan on using for
definitions in my system.

As a side note, you may wonder what the use there would be in
eliminating "==" as a special construct... If the system provided
a construct like "define" in place of "==", then it would
be possible for the system to talk about its own definitions;
for instance,
  
  ["popd" [[dip] pop] define]

would be a first-class expression in the system, a quotation
of a program that makes defines "popd" to mean "[dip] pop". One of
the main benefits the author boasts of the system is the ability
of the system to talk about its own programs directly, using quotation,
without the need of encoding like "Goedel numbering". Adding
"define" as a first-class primitive would extend this ability
in a natural way.

> The definition of 'y' in Joy is:
> 
>  y == [dup cons] swap concat dup cons i;

Similarly, in my system, the Y combinator might be defined as:

  Y == BM(CBM)

where "M" is the combinator that applies a function to itself:

  M == WI

The reason that the definition in my system appears shorter
is, of course, only because of the convention of naming combinators
with a single letter (whereas Joy used longer identifiers);
this convention, when identifiers are ran together without spaces,
may lead to some problems when the system also uses multiple-letter
identifiers (it may become impossible, given a expression, to determine
where one identifier ends and the next begins). This problem can be
solved with the convention that all identifiers begin with a capital letter
but contain no more. But, this convention would make parsing a
bit complicated; in a pure system that uses "@", a better
approach would be terminate each primitive expression with
some terminator character that is not used anywhere else,
or to otherwise encode the primitive expressions so that
they are self-delimiting.

The two definitions above of "Y" and "M" are a bit sugared; in unsugared
form they might appear:

  @@Define "Y" @@BM@@CBM
  @@Define "M" @WI

This would still not be quite fully unsugared, because of the
string literals "Y" and "M"... Like Joy, my system will probably use
lists of characters for strings; so, string literals will
be sugar for a list of character primitives... Lists will probably
be taken as functions that pass their elements as parameters.
So, using the T combinator (which forms a unit list), the
two definitions above could be written in fully unsugared form
as:

  @@Define@TYlet@@BM@@CBM
  @@Define@TMlet@WI

These two expressions are each purely applicative. The first
could be written using a binary tree as (warning: ASCII art)

                @
               / \
              /   \
             /     \
            /       \
           /         \
          /           \
         /             \
        /               \
       /                 \
      @                   @
     / \                 / \
    /   \               /   \
   /     \             /     \
 define   @           @       @
         / \         / \     / \
        T  Ylet     B   M   @   M
                           / \
                          C   B


String literals would be a bit messier to unsugar if they were
more than one letter long (one would have to use a variation
of the "T" combinator, applying it to each of the character
primitives in turn, using currying).

Anyway, I'm still a bit undecided on how exactly to deal with
lists in the system; I may end up prefixing the size of the
list as the first element, or else may treat a list the
classical way, as a chain of pairs (made from "cons"), ending
in a "nil".

> > In a purely applicative language, a factorial function might 
> > be written as:
> >   Y (S3 (ifte =0) (K 1) . S times . CB (minus 1))

Oops... this is not quite right; should be:

  Y (S3 (ifte.=0) (K 1) . S times . CB (minus 1))
             ^

Also, "S3" will probably not really work because it would be
construed as the application of "S" to "3", but this is
quite a syntactic matter...

Of course, the factorial function could be written more concisely
using a function like Joy's "primrec" as:

  Primrec (= 0) (K 1) Times

The parentheses could probably be omitted, the spacing giving
the structure:

  Primrec Q0 K1 Times

where "Q" is the equality primitive (used out of reluctance to
use "=" in a syntax other than infix).

Anyway, once again, this would be sugar for:

  @@@Primrec@Q0@K1Times

Or, written using a binary tree:

               @
              / \
             /   \
            @   Times
           / \
          /   \
         /     \
        @       @
       / \     / \
      /   \   K   1
  Primrec  @
          / \
         Q   0
  

> > Hopefully you do not see a purely applicative approach as all
> > so bad now, seeing how similar in spirit it is to Joy.
> 
> Well, I'm not sure that a single example can show that much.  Can you refute
> my claim that Joy is simpler than even a truly pure applicative approach
> (and your language isn't pure, but rather adds the notion of combinators to
> an otherwise applicative base)?

Well, that depends on what precisely you mean by "simpler".
A purely applicative approach is simpler in the sense that
the only constructs needed are primitives and function-application,
while the Joy approach needs at least primitives, concatenation,
and quotation...

You seem hesitant to accept that quotation is a real construct of Joy...
Perhaps a quote from the manual may help (beginning of
"Mathematical foundations of Joy" in the synopsis):

  Joy programs are built from smaller programs by just two constructors:
  concatenation and quotation.

  Concatenation is a binary constructor, and since it is associative it
  is best written in infix notation and hence no parentheses are
  required. Since concatenation is the only binary constructor of its
  kind, in Joy it is best written without an explicit symbol.

  Quotation is a unary constructor which takes as its operand a program.
  In Joy the quotation of a program is written by enclosing it in square
  brackets. Ultimately all programs are built from atomic programs which
  do not have any parts.

In Joy, as the authors have formalized it, expressions are built
from primitives using two constructs: concatenation and quotation.
In purely applicative systems, only application is needed.
Parentheses often play the role of grouping that quotation does
in Joy, but parentheses can be eliminated as I've shown above.

> Furthermore, having looked at the proofs written in the Joy manuals, can you
> duplicate any of them as simply?

I believe so... I'm not sure exactly which proofs you're talking about...
One proof that stands out was the "naturality of 'cons'". Here was the
proof they gave in the manual (in "The Algebra of Joy"):
 
   Somewhat more difficult is the naturality of cons. In conventional
   notation this is expressed by
        map(f,cons(x,L))  =  cons(f(x),map(f,L))

   and in Joy notation by
        [cons]  dip  map   ==   dup  [dip]  dip  map  cons

   This is so complex that a step-by-step verification is called for. Let
   L and x be the list and the additional member. Let [F] be a program
   which computes the function f. Let x' be the result of applying f to
   x, and let L' be the result of applying f to all members of L. The
   proof of the equivalence of the LHS and the RHS consists in showing
   that both reduce to the same program. For the LHS we have:
        x  L  [F]  [cons]  dip  map                             LHS
    ==  x  L  cons  [F]  map                                    (dip)
    ==  [x L]  [F]  map                                         (cons)
    ==  [x' L']                                                 (map)

   For the RHS:
        x  L  [F]  dup  [dip]  dip  map  cons                   RHS
    ==  x  L  [F]  [F]  [dip]  dip  map  cons                   (dup)
    ==  x  L  [F]  dip  [F]  map  cons                          (dip)
    ==  x'  L  [F]  map  cons                                   (dip)
    ==  x'  L' cons                                             (map)
    ==  [x' L']                                                 (cons)

   The two sides reduce to the same program, so they denote the same
   function.


Note that they resorted to funny notations for x' and L'. Also
the rule referred to as "(map)" doesn't seem to be a precise
syntactic inference rule (it is based on their strange notation
of "x'" and "L'"). Actually, the rule seems to be a vague version
of the precise thing that they are trying to prove; so, they're
assuming a slightly weakened version of the thing they are
trying to prove, making the whole proof quite silly.

I'm not sure if it would be possible to restate the Joy proof using
more precise methods. Anyway, here is a proof of an analagous result
in a purely applicative system:

The fact to be shown is that

  MapF ConsXY

is replacable in the system by
 
  Cons FX MapFY

for all functions "F", things "X", and lists "Y". A stronger
version of this result, that it holds for all "F", "X", and "Y"
(which may or may not be provable in a purely applicative system,
depending on its degree of "typedness"; for simplicity, I'll
assume here that we're dealing with a quite untyped system)
can be written using combinators by saying that

  CBCons.B.Map

is replacable by

  S (Bo2 Cons) Map

Now, this property is almost the defining property of "Map",
but we won't just say that it holds "by definition", because "Map"
will probably not really be taken as a primitive of the system.
So, we'll give a recursive definition of "Map" in terms of
"Cons", "Nil", "Fst", and "Snd" and then show how the desired property
follows from the properties of these primitives.

Now, "Cons", "Nil", "Fst", and "Snd" will probably not be primitives
of the system either, but for now we'll take their characteristic
properties as axiom schemes (since the Joy proof did no better):

  (Fst): Fst (Cons X Y) = X
  (Snd): Snd (Cons X Y) = Y
  (Nil): (Cons X Y = Nil) = CK

This last property is a way of saying that no pair formed from
"Cons" can be equal to "Nil". This system uses the "K" and "CK"
combinators for the truth and false proposition values, respectively.
(This is a convenient representation because then "P X Y" results
in "X", if "P" is the truth proposition, and "Y" if "P" is the
false proposition; so, simply applying a proposition to two
parameters can be used for as conditional, without a need
of a special "ifte" primitive).

Also, there will be axiom schemes for some combinators:

  (B)  : BFGX = F(GX)
  (C)  : CFXY = FYX
  (S)  : SFGX = FX(GX)
  (Bo2): Bo2FGHXY = F(GX)(HY)
  (N)  : NFGHX = F(GX)(HX)
  (Linrec): LinrecFGXYZ = FZ GZ (XZ (LinrecFGXY YZ))

These combinators will probably not all be taken as primitive,
so these schemes may be derived rather than postulated as axioms.

Now, we can define "Map" using "Linrec" as:

  C (Linrec QNil KNil) Snd . BCons . CBFst

Now, finally, we prove the result, that "MapF ConsXY" is replacable by:

  (C (Linrec QNil KNil) Snd . BCons . CBFst)F ConsXY     (definition of Map)
  C (Linrec QNil KNil) Snd (BCons (CBFst F)) ConsXY                      (B)
  Linrec QNil KNil (BCons (CBFst F)) Snd ConsXY                          (C)
  QNil ConsXY (KNil ConsXY) (BCons (CBFst F) ConsXY (...))          (Linrec)
  CK (KNil ConsXY) (BCons (CBFst F) ConsXY (...))                      (Nil)
  K (BCons (CBFst F) ConsXY (...)) (KNil ConsXY)                         (C)
  BCons (CBFst F) ConsXY (...)                                           (K)
  Cons (CBFst F ConsXY) (...)                                            (B)
  Cons (B F Fst ConsXY) (...)                                            (C)
  Cons (F (Fst ConsXY)) (...)                                            (B)
  Cons FX (...)                                                        (Fst)
  Cons FX (Linrec QNil KNil (BCons (CBFst F)) Snd (Snd ConsXY))   (see note)
  Cons FX (C (Linrec QNil KNil) Snd (BCons (CBFstF)) (Snd ConsXY))       (C)
  Cons FX ((C (Linrec QNil KNil) Snd . BCons . CBFst)F (Snd ConsXY))     (B)
  Cons FX (MapF (Snd ConsXY))                            (definition of Map)
  Cons FX MapFY                                                        (Snd)

Part of the result of the fourth step was abbreviated as "..." until
near the end... 

This proof was, of course, quite a bit messier than the Joy proof,
but then again, it actually proves something, unlike the Joy proof...
Perhaps this was not such a good choice of thing to proof, since
it is not really analagous to the proof in the Joy manual; if you
have a favorite proof in the Joy manual, maybe I'll try that one.
Anyway, that was probably not exactly the cleanest way to do it, either.

Anyway, both the proof I just gave and the Joy proof are informal
proofs about systems... One interesting thing is that both proofs
used variables (in the "epi-language", not in the formal language itself);
in the case of the purely applicative system at least, the proof could
be rewritten without variables at all using the Combinatory Axioms
(and abstracted versions of the Fst, Snd, and Nil axiom schemas).
This would have been a bit trickier, though, and I'm not
really too familiar with the combinatory axioms, so I didn't
attempt it. To give you a flavor of how this might work though,
here are a couple combinatory axioms:

  B(BB)B = C(BB(BBB))B
  BCC = I

The first one states the associativity of composition; for, from it
it follows that

  B(BB)B F G H = C(BB(BBB))B F G H
  BB BF G H = BB(BBB)F B G H
  B(BFG)H = B(BBBF) B G H
  B(BFG)H = BBBF BG H
  B(BFG)H = B BF BG H
  B(BFG)H = BF(BGH)

or in other words, that
  
  (F.G).H = F.(G.H)

The other combinatory axiom mentioned (that "BCC = I") states
that taking the converse of a function twice is the same as
doing nothing at all (i.e., the composition of the "C" combinator
with itself is the identity combinator).

These two axioms are not particularly special, but with a
handful of ones like them, a sort of completeness follows;
it would be possible to directly show that

  CBCons.B.Map

is replacable by

  S (Bo2 Cons) Map

without the use of variables like "F", "X", and "Y".

Anyway, hopefully this example has helped you gain a feel for
the elegance a purely applicative system can have.

Again, if you have a favorite proof from the Joy manual,
feel free to tell me what it is...

> I've recited already the five syntactic elements that every applicative
> language needs in order to produce code; I've also recited the three
> elements that a concatenative language needs.  I've also made the claim that
> your language needs one additional element (combinators) which is distinct
> from any of the other five needed by a truly pure applicative language.
> 
> Can you make any counterclaims?

This was mentioned above... I don't see that those five
"syntactic elements" are essential in every applicative language
or how they are evident in the one I've presented...
And, combinators are not particularly special elements in
the language I've given, but ordinary primitives.

Okay... Earlier I said that I'd give some reasons to like
purely applicative systems at the end of the post, so here are some
(these are reasons to prefer purely applicative systems over
similar systems that are mainly applicative, but with some extensions,
such as infix operators):

  1) The syntax is simpler. In a computing system, this may be
     beneficial because it will make parsers, compilers, 
     and other programs that deal with expressions (such as
     a program that transmits information between machines,
     over networks) easier to construct. 
       (this is quite a minor one, compared to the next).

  2) It ensures that the system can be made combinatorialy complete
     with a finite set of combinators. If a system has an
     infix operator that cannot be used as a first-class function,
     then it is not generally possible to emulate functional
     abstraction with combinators (rather, one would likely
     need to resort to a lambda construct).

If the second reason does not make sense, consider a system that
is mostly applicative, with combinators "B", "C", "W", and "K",
but extended with an binary infix "+" operator. In this system it
would not be possible to form the function that adds a number
to itself by "+". One cannot throw combinators at it, because
"+" cannot be isolated as a first-class function. 

One could, however, form such a doubling function in the system if it
was further extended with a lambda construct. A doubling
construct would be expressed by "x\ x+x" (where "x\ Z" means
"the function that given 'x', yields 'Z'"). Yet, there are
disadvantages, I think you'll agree, of basing a system on
a lambda construct. The whole problem could be avoided if
one formalized "+" as a first-class function that takes
two parameters through currying, instead of as an infix operator.

> > - "iepos" (Brent Kerby)
> 
> -Billy

So, what do you think of purely applicative systems now?

- "iepos" (Brent Kerby)