Joy, Lambdas, Combinators, Procedures

iepos@tunes.org iepos@tunes.org
Thu, 24 Feb 2000 14:15:02 -0800 (PST)


> The system already can't be implemented as a switch-case;
> its implementation is like Forth's:
> 
> while next_word(source):
>  if ( dictionary lookup ) then ( execute )
>  else if ( interpret as number ) then ( push on stack )
>  else error;

Ahh... that's right; I forgot that there was special syntax
for numbers. By the way, I'm guessing that a Joy-styled system
could do away with numbers as primitives, using instead something
like the Church numerals... Combinators analagous to Church numbers
in a Joy system I'll call "dozero", "doonce", "dotwice", etc.,
and in general "iterators". They would have these properties:

  [x] dozero   == []
  [x] doonce   == [x]
  [x] dotwice  == [x x]
  [x] dothrice == [x x x]

Essentially, each iterator takes a quoted program and yields another
quoted program that executes the original so many times. Each of
these could be defined in terms of more ordinary Joy primitives as:

  dozero   == pop []
  doonce   == id
  dotwice  == dup concat
  dothrice == dup dup concat concat
  dofour   == dup dup dup concat concat concat

One interesting thing about these iterators is how easy it is
to do arithmetic with them. Multiplication, for instance, is
very simple; "dothrice dofour" is the same as "dotwelve", because

  [x] dothrice dofour  ==  [x x x] dofour == [x x x x x x x x x x x x]

You can multiply iterators simply by concatenating them...
And then, there is addition, which is a little bit trickier.
But, "dup [dothrice] dip dofour concat" is the same as "doseven", because

    [x] dup [dothrice] dip dofour concat
 == [x] [x] [dothrice] dip dofour concat
 == [x] dothrice [x] dofour concat
 == [x x x] [x] dofour concat
 == [x x x] [x x x x] concat
 == [x x x x x x x]

In general, it looks like "dup [m] dip n concat" is the sum of "m" and "n".
>From this, it looks like a general addition program would be

  add == [[dup] dip] dip [[i] dip] dip i concat

When this is ran with quoted iterators "[m]" and "[n]" on the stack we get
  
 [m] [n] [[dup] dip] dip [dip] dip i concat
 [m] [dup] dip [n] [dip] dip i concat
 dup [m] [n] [dip] dip i concat
 dup [m] dip [n] i concat
 dup [m] dip n concat

In turn, when this new iterator (the sum of "m" and "n") is ran
with a program "[x]" on the stack, we get:

 [x] dup [m] dip n concat
 [x] [x] [m] dip n concat
 [x] m [x] n concat

These iterators seem quite interesting. Exponentiation among iterators
is simple... "dotwice dotwice dotwice dotwice" is clearly
dotwice multiplied by itself four times (since concatenation acts
as multiplication with iterators); thus, in general, "[m] n i"
is the iterator "m raised to the nth". Thus we can construct
an "exponent" program (taking two quoted iterators off the stack) as:

  exponent == i i

Then, one might wonder how one could achieve subtraction or division
with these iterators... My guess is that it is possible, but difficult,
as it is with the analagous Church numerals. In Joy, if the ordinary
numbers are still available, one can test the size of an iterator
by running it on a simple list with exactly one element in it;
then test the size of the resulting list, which will tell how big
the original iterator was.

Oh yeah... one other thing that one might do with iterators is test
whether they are zero. It might be possible just to use the "="
program to check, but this probably would not really work,
because when applied to quoted programs, "=" only checks to
see if they have the exact same structure (as lists), not whether they
are equivalent programs in a more abstract sense (for, it is possible
to write the zero iterator as several different quoted programs).
But, there does happen to be a way to test iterators for
zero-ness... consider:

  [x] [pop [y]] dofour i

When this is ran, it gives

  [x] pop [y] pop [y] pop [y] pop [y]

and eventually just

  [y]

All iterators will end up giving simply "[y]" as a result, except
for zero, which will give "[x]",

     [x] [pop [y]] dozero i
 ==  [x] [] i
 ==  [x]

Anyway, I'm not sure what the use of these iterators are... They
show what can be done with just the "pure" programs like iterators...
These iterators can be used to do arithmetic; but, there are
probably better ways, with cleaner solutions for subtraction
and division; for instance, one might use a list of bits
(perhaps using "[pop]" and "[[pop] dip]" for truth and falsehood),
in the ordinary base-2 way; one could then construct addition,
subtraction, multiplication, division, and comparison programs
in reasonably efficient ways.

> > In other words, whereas now Joy has specific support for just a few
> > stack-operators like "swap" and "dup", if this new "--" notation were
> > adopted in general, then it would not be enough for Joy to support
> > just a few of the expressions of the form "...--..."; it would have
> > to support them all.
> 
> Actually, it would have to support a finite-length subset of them all.  I'm
> not a stickler for impossible things.  The very fact that I'm only
> specifying lower-case alphabetic letters on the left hand side limits the
> complexity to 26 inputs, and I'm not interested in more (although I don't
> see a reason to limit it to fewer, even though I strongly hope nobody ever
> uses that many).  

Ahh.. You're right. It is not necessary for Joy to support
_all_ expressions of the form "...--...". Let me modify my
statement to this:

  "it would have to support so many of them that a switch-case
  implementation would no longer be practical".

But... as you pointed out, it seems that Joy already isn't implemented
using a switch on the next word... Anyway, I really don't have anything
against adding a "...--..." construct; all I've meant to say is that
adding it would be a fairly significant change to the system (it probably
would not extend the class of programs that could be written with
the system, but it would change the way that the system would need
to be implemented).

> > I'm still quite interested in knowing of an "abstraction algorithm"
> > that could be used to rewrite "...--..." expressions using
> > just "dip", "swap", and such. You've hinted that "dip" will
> > be important, although surely other operators will also be needed.
> 
> I don't have the algorithm written out, although I'm sure I can find it if I
> need to; I've seen it before.  I was actually planning on working it out by
> hand, since it's not immensely complicated as such algorithms go :-).  

I would guess that it probably will not turn out to be very complicated.
But, sometimes simple solutions can be very elusive (at least,
they can be when I'm trying to find them :-)). 

I bet it would be easier to find a general algorithm if we first
try to work out some examples...

How about "xyz--zyx"? Well... If I had x, y, and z on the stack,
I would probably proceed like this... First, looking at the right
side of the "--", I would realize that I need "z" on the top of
the stack (for a time). It is already on the top, so I would move
on. Then, I need "y" on the top of the stack. "y" is not on the
top of the stack, but could be moved there with "swap". So,
I would begin the program with

  swap

Then, I need to get "x" on the top of the stack. I've kept track
that is still buried two items deep, so it could be moved
to the top with "dig2", which is a combinator I made up. I think
we may need a whole series of "dig" combinators, so let's see
if the series could be constructed from more ordinary combinators.
The "dig" combinators would have these properties:

  a b c d dig0 == a b c d
  a b c d dig1 == a b d c
  a b c d dig2 == a c d b
  a b c d dig3 == b c d a

Each "dig" combinator digs up a parameter that is buried so many
items deep in the stack. It occurs to me that we could get a stack
item to the top by using repeated swapping. Thus we can define the
dig combinators as:

  dig0 == id
  dig1 == swap
  dig2 == [swap] dip swap
  dig3 == [[swap] dip swap] dip swap

Each dig combinator can be constructed from the previous by
enclosing it in []s and then appending "dip swap".

Using these dig combinators, I think I can see a general algorithm
for working out "...--..." for combinators that do not duplicate
or destroy their parameters (the situation may become trickier
for combinators that perform dupping or popping)... here is
the "algorithm":

  program abstraction-algorithm(list leftside, list rightside){
    // leftside is the list of letters on the left side
    // rightside is the list of letters on the right side

    stack mystack = emptystack;
    program result = id;

    foreach x in leftside, push x onto mystack.

    foreach x in rightside{
      where is the letter "x" on mystack now?
      it is buried "n" deep on the stack, so append "dig-n" to 'result'.
      adjust mystack so that it reflects this change in stack.
    }
  }

Applying this algorithm to a few example combinators gives:

  xyz--zyx  ==  dig0 dig1 dig2
  xyz--yxz  ==  dig1 dig2 dig2
  xyz--xyz  ==  dig2 dig2 dig2

I think this algorithms always gives correct results, although maybe
not good ones. As you can see, the result it gives in "xyz--xyz"
is not very nice; we would hope that it would have given "id"
(although actually "id" might not be considered a correct result
in this case, since "xyz--xyz" might fail when ran on a stack without
enough parameters, while "id" would not). The results could
always be written in terms of just "swap", "dip", and "id"
using concatenation and quotation.

Ahh... it now occurs to me a simple, general way to do abstraction...
It is similar to the last one, except that instead of moving the
parameters with "dig", you copy them with "select", and then at
the end you "pop" all the original parameters off. This is the way i guess
that most C compilers shuffle parameters around. It reminds of
the traditional "SKI" algorithm for combinators, with all its
unnecessary dupping and popping.

The question remains of how the "select" combinators can be constructed
from ordinary primitives like "dip", "swap", and "dup" (a similar,
but not identical, question was considered in my last post,
so this seems kind of strange). These "select" combinators would
have these properties (for example):

  a b c d select0 == a b c d d
  a b c d select1 == a b c d c
  a b c d select2 == a b c d b
  a b c d select3 == a b c d a

These first ones could be constructed in this way:

  select0 == dup
  select1 == [dup] dip dig1
  select2 == [[dup] dip] dip dig2
  select3 == [[[dup] dip] dip] dip dig3

It looks like we can always get a select by first making a copy
of the desired parameter (using a dipped dup) and then digging
the copy up. But, there is a more direct way to construct the selects
without using dig:

  select0 == dup
  select1 == [dup] dip swap
  select2 == [[dup] dip swap] dip swap
  select3 == [[[dup] dip swap] dip swap] dip swap

Since all these select combinators can be constructed from
"dup", "dip", and "swap", this shows that all of the Joy combinators
(meaning those "...--..." programs) can be constructed from
just "dip", "swap", "dup", and "pop".

I'm still curious to know if there is a single program that plays
the role of "dip", "swap" and "dup" (akin to the "S" of combinatory
logic) that, with "pop", can form all combinators.

Anyway, it is not likely that an efficient implementation would
try to use just "dip", "swap", "dup", and "pop" as primitives;
As far as I know, Intel processors do not even have special
instructions for "swap", "dup", and "dip"; basing a whole
implementation on these primitives would probably be a bad idea.
"select" and "pop" would probably be better choices.

> >   - "f\f f": a function that takes a parameter "f" and 
> >     results in "f f",
> >     the application of "f" to itself. (This is sometimes called
> >     the mockingbird combinator).
> >   - "x\y\x": a function that takes two parameters "x" and "y" (through
> >     currying) and results in "x". In other words, this combinator
> >     takes a parameter of "x", and results in a function that
> >     always returns "x". This is sometimes called the K combinator.
> >   - "x\y\z\ xz(yyy)(zx)". This is a weird combinator with no common
> >     name and seemingly little significance.
> 
> Cool.  So you also have a compact notation for your combinators (which you
> could add to a variant of your language in the same sense that I've
> considered adding the stack:() notation to mine).

Yes, this way of writing combinators is known as "lambda".
In general, "x\Z" means "the function that given 'x', yields 'Z'";
in these definitions of combinators, I am using nested lambdas...
For instance, "x\y\x" is the same as "x\ (y\x)", the function
that given 'x', yields the function that given 'y', yields 'x'.

By the way, although my CLIP system is an experiment to see
if systems without lambdas are practical, I really do not have
anything against using lambdas in a system, at least internally...
(wouldn't that be odd? lambdas used internally to implement
a combinatory system...)

But, it does look like writing functions in lambda form has
some advantages... in particular, when two combinators are
written in lambda form, it very easy to tell if they are
equal, because in lambda form, there is only one way to write
each combinator. (in contrast, when writes combinators in terms
of primitive combinators, using application, there are usually
several distinct ways that they can be written). For example,
consider the combinators "KI" and "CK"... These are actually
the same combinator, although this is not obvious from their
appearance (this is somewhat analagous to the "[pop] dip" and "swap pop"
of Joy, which are the same). However when we write the combinators
in lambda form we get

  KI == (x\y\x) (z\z)        == y\z\z
  CK == (x\y\z\ xzy) (a\b\a) == y\z\ (a\b\a)zy == y\z\ (b\z)y == y\z\z

Both reduce to "y\z\z" when they are written in lambda form
(actually, we got somewhat lucky that the same names for the
variables ended up on both of them; sometimes it may be necessary
to rename variables to make them match).

But, of course, basing a system on the lambda construct means that
the system cannot be purely applicative. This might be somewhat of a
disadvantage.

It still seems to be a possibility to me that there is some simple
algorithm for comparing combinators for equality, without resorting
to the use of variables, but I do not know of such an algorithm.
Perhaps even there is a certain base of combinators in which
there is only one beta-normal way to write each combinator.
(any base involving "C", "K", and "I" will not qualify for this,
as evidenced by the example of "KI" and "CK", which are in
beta-normal form, but which denote the same combinator;
a similar example could be used to rule out a system that
has both "B" and "C", and several other pairs of common combinators).
I don't know... this is actually a question that I suspect is
well-studied, so I wouldn't be surprised if the answer is out
there somewhere.

> > > so all functions can therefore be regarded as combinators in any
> > > system (including yours).
> 
> > Not so. But, this is true in a certain sense... there are interesting
> > ways in which pure functions can be used to emulate numbers,
> > propositions, and lists. It is conceivable that pure 
> > functions could be used to
> > represent even other things like people, pictures, sounds, 
> > and programs,
> > but this would be a bit weird, and I don't plan to do it in my
> > system.
> 
> Given the theory behind your system, I don't see how you have any choice.

"Output" is a function in my system which is not a combinator.

> You *are* using a finite number of things to represent all possible
> concepts, surely an infinite set.  You _have_ to permit your user to use the
> finite number of elements to represent whatever they want.

But these elements do not all need to be combinators...

> It's like Forth's distinction between words and numbers; numbers are handled
> by the compiler, and they're a completely seperate case from words.

This is a distinction that is practically convenient, I would guess,
but not absolutely necessary. It would be possible to form all
numbers from, say, just the primitives "0", "1", "+", "*", "-", "/", etc.
(actually, it may not be possible to form all numbers (speaking
of real numbers, which are non-enumerable I think, and thus cannot
all be represented in any ordinary kind of language), but it is
possible to form all numbers that could be formed otherwise
using special base-10 syntax, as in "1432" or "23.532").

> > There's one thing somewhat relevant to this that I may not have made
> > clear yet... whereas many programming languages only provide
> > ways of representing programs, languages of the type I've described
> > will cleanly be able to represent other concepts, such as numbers,
> > propositions, lists, functions, as well as programs. I think
> > some enlightenment may result from comparing my language(s) to
> > human languages like English, rather than to programming langauges.
> 
> You're 'accusing' other languages of faults which are your own constructs.
> Your language is not the first 'declarative' programming language ever
> developed; in fact, it's not even an especially good one, since its syntax
> is so focussed on your style of maps.

Okay, I can see that maybe this was not such a relevant comment
to insert after all... Anyway, here I have not really accused any
languages of any faults (I don't consider it absolutely a fault
not to be able to represent anything other than programs).
Other than that, I can buy your reply... Indeed my language
is not the first "declarative programming language" or an
especially good one.

> You've also invented a concept of "true representation", in which your
> language is gifted with the ability to truly represent things, while all
> other languages falsely represent them.  

Hmm... Where have I invented the concept of "true representation"?
I really have not said this.

> Since you've never defined your
> concept of representation, it's impossible to guess what this means to
> anyone.

Indeed, I probably should not of brought up the matter
of the "meaning" of expressions, since it doesn't really matter.
It would be possible to formally assign each expression in
a language a specific corresponding meaning, but I don't
really plan to do that in my system. who cares what the meaning
is, as long as it works...

> > Neither do I have my mind set on any particular
> > technique for interpreting text that user's type in. Anyway, it
> > doesn't really take very long to parse text, so this doesn't
> > really seem to matter too much to me 
> 
> It's certainly much, much more than the 1*O(n) operation my language
> requires.

Hmm... I believe the order of my parser is also "O(n)" I guess you might
say, if "n" is the number of characters in the program. I think this
could be proven by examining the fragments of C code in my last post.

> Justification: each word in the input stream has to be lexed, then each word
> has to be reduced to its internal representation (which is likely a matter
> of a dictionary lookup and replacement with the appropriate number of "S"es,
> "K"s, and other fundamentals).  Then you have to load the words into their
> tree structure, reading left to right and performing pinhole optimization.
> You use a stack to keep track of unresolved applications, since the most
> recent application "@" you reached is always the appropriate place to attach
> the next parameter (even if the parameter is itself an application).

Not sure what "pinhole optimization" is... Incidentally, I do use
a stack in my normalizing procedure, norm_head(), but I didn't
consider this part of the parser; there is actually no limit to
the time that will take. My current normalizer is very slow;
I realized recently that there is no sharing of reductions
for duplicated paramaters (this is lame); it will have to fixed...
but there will still be no limit to its slowness. Normalizing
combinatory expressions is I think a task that is fundamentally
indefinitely slow (there exist fairly small expressions that
take massive amounts of reduction before they can be normalized);
it might not even terminate at all.

> Finally, you can optimize (a task which I'll assume takes zero time), and
> then evaluate the result, using at least depth first traversal (assuming
> that you only want to execute it as a program).
> 
> The biggest time-sink there is the depth-first traversal; it's n*log(n).
> That's not as bad as I'd feared.  Of course, adding the possibility of
> source modification we get a new worst case of n^2 log(n).  

I'm not sure where you are getting these numbers from...
I would be happy if in principle my normalizer terminated in about
"O(2^n)" time if "n" is the size of the expression fed to it; but this is not
so... I think there are series of expressions (with linear increase
in size) whose slowness in normalization grows even worse than
exponentially. Actually, such a series is

  WBII                  (4 steps)
  WB(WB)II              (12 steps)
  WB(WB)(WB)II          (60 steps)
  WB(WB)(WB)(WB)II      (258294 steps)
  WB(WB)(WB)(WB)(WB)II
  ...

In each case, the expression normalizes simply to "I". The slowness
can be expressed this way: if one expression takes "n" steps to
normalize, then the next takes approximately "2 ^ n" steps.
I am not sure how many steps that last expression would take to
reduce (my little normalizer segfaulted, but probably would
never have gotten finished in my lifetime anyway, even with
unlimited memory); there may be a simple formula stating
how many steps each takes to reduce, but I do not know what it is...
Anyhow, I know that the next expression in the series not listed
would take more than "2 ^ 65536" steps.

But, this is really only an indication of the power of a combinatory
normalizer. A combinatory normalizer can be used to solve any
computational problem, I think...

> The problem with your metaprogramming suggestion is twofold:
> 
>  - it requires knowledge of the source file's name and total overall
> structure, as well as the willingness to muck about with the original source
> code.

It requires knowledge of the source file's name, yes. Normally when you
want to refer to things, you have to know a name for them.

But, it is possible, of course, for a program to refer to its
own source code without using its name. It could be done
using Quining (an example of this will be given shortly).
Now, if the program wanted to refer just to
itself, rather than to its source code, then the problem
is very simple. Just use the "Y" combinator... this is
very useful anyway when constructing indefinitely repeated
procedures; for instance, as mentioned already, the program
that outputs "hello" and then goes back and "does itself" again
is "Y (Output 'hello')". It would of course be possible
for a program to do other things with itself than just
execute itself. For instance, the program,

  Y (Output 'hello' . C Output stop)

is a program that Outputs "hello" and then Outputs the program
itself (I'm supposing here that this "Output" takes an arbitrary
parameter, not necessarily a string, and outputs a representation
of it). By a few reductions (including "Y"'s reduction rule,
which says that "Yf = f(Yf)") this program is the same as

  Output 'hello' (Output (Y (Output 'hello' . C Output stop)) stop)

The interesting thing is that the Y combinator can actually be
constructed from other combinators (for instance, as "BM(CBM)",
or in other words, "B(WI)(CB(WI))"). An example application
of such a constructed Y:

  BM(CBM)f            
  M(CBMf)             [B]
  M(BfM)              [C]
  BfM(BfM)            [M]
  f(M(BfM))           [B]
  f(BfM(BfM))         [M]
  f(f(M(BfM))         [B]
  f(f(BfM(BfM)))      [M]
  f(f(f(M(BfM))))     [B]
  f(f(f(BfM(BfM))))   [M]
  f(f(f(f(M(BfM)))))  [B]
  ...

It goes on forever, the reduction does of course. It reminds me
of a puffer from Life. recursion is a weird thing.

Ah... back to quining... one might construct a pseudo-quine in
my system as

  Y Output

But, if this would not really be guaranteed to output its exact
source code; it may output some other representation of the program.
Also, it is sort of cheating to use such a complex primitive
as Output (which sort of has built-in stringification). But,
it is possible to construct a real quine using just "put"
(a function that takes a string and yields a procedure that
outputs that string)... here is a real quine:

  (put + K(put qmark) + put + K(put qmark)) "(put + K(put qmark) + put + K(put qmark)) "

This relies on sugar for quotation, but there are other (messier)
ways of doing it if that is not available.

> Note the difference from how Forth does it:
> 
>  - Forth only allows metaprogramming in the part which hasn't been parsed
> yet.  This means that re-reading is never needed.
>  - Forth allows modification relative to the current location in the source
> file.
>  - Forth's modifications have zero effect on the source itself; they don't
> even affect the internal representation of the source.  Technically, all
> they do is discard source and/or provide alternate source.

Hmm... I can see that the way Forth does it is not the way I planned
on doing it. It seems that in Forth it is possible for a program
to open up a file (or some other field) containing its own source code
and change it so that when the parser gets to it, it will something
different than it would have if it had not been changed.

This is really weird. I have no idea why I would want this ability.
On the other hand, in a finished system, I would hope that a
program could change its own internal representation in memory,
in a way that could make it go faster. This probably would not
be a thing that one would ordinarily do explicitly in my system
though. Such things would ordinarily be handled by the core of the system,
which handles memory management, compilation, and such, and which
moves things around and changes them whenever it wants.

> -Billy

- iepos ("Brent Kerby")