Joy, Lambdas, Combinators, Procedures

iepos@tunes.org iepos@tunes.org
Tue, 8 Feb 2000 13:37:11 -0800 (PST)


> > Still, it is not my view that non-purely-applicative systems
> > are innately inferior... there may be other well-designed
> > approaches that are also interesting (such as Joy's approach).
> > My point is only that purely applicative languages are
> > worth considering seriously;

> The interesting thing to me about your language to me isn't really its
> applicativeness; other languages are at least as applicative.  The
> interesting thing is the strong presence of combinators.  Perhaps your
> language would be better called a combinative language -- most people, when
> discussing languages, assume applicative and lambda-based.  Your language is
> applicative and combinator-based; thus you say in one word 'combinator' the
> essence of your difference from most modern language designs.  

Indeed, saying that my system is "combinator-based" would probably
be the most concise way to distinguish it from traditional systems.
I've used "purely applicative" here to distinguish my system
from a Joy style ("concatenative") system, which is also based on
combinators in a way.

> > First of all, expressions that denote combinators (such as
> > "B", "C", "W", and "K") are primitives in the language.
> > Recall in my definition of "purely applicative" that
> > primitives (meaning atomic expressions that are assigned 
> > precise meanings) are allowed, as long as there are only a 
> > finite number of them. In fact, primitives are essential
> > in a purely applicative system (Joy also relies heavily
> > on primitives (like "dup", "swap", "dip", "map", etc.), 
> > rather than fancy
> > constructs; this is in contrast to languages like C and even
> > Haskell, which rely on myriads of fancy constructs).
> 
> WHOAH!  Now I see what you meant.  You really threw me off by using the word
> "finite" there.  (I don't know of any languages which use an infinite number
> of primitives, by ANY definition of 'primitive'.)

Yeah... I'm glad you see what I mean now... one thought that might
be revelant: a system might like to formalize as primitive expressions
the number literals 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, ... 
strictly, this is not allowed in a purely applicative system since
it would require an infinite number of primitives. Instead, one
could take as primitives only the numbers 0 through 9 (and maybe
"ten"), and then refer to the other numbers using multiplication and
addition. For instance, the number 532 could be referred to as 

  (5 * ten * ten) + (3 * ten) + 2

Or without syntactic sugar for "+" and "*":

  + (* 5 (* ten ten)) (+ (* 3 ten) 2)

> I suspect you'd dislike one of my potential changes to Joy: I wanted to
> remove all of the stack manipulation words and replace them with a single
> syntax convention:
> 
> LET
>  dup == stack(a--aa);
>  drop == stack(a--);
>  swap == stack(ab--ba);
>  rot == stack(abc--bca);
>  4reverse == stack(abcd--dcba)
> IN
>  your code
> END.

Well, your "stack(--)" construct would then be much like a lambda construct
of applicative systems (note that it requires the use of variables
like "a", "b", and "c"). This would probably cause complications in proof
systems.

However, such a construct would still be quite interesting...
It could be generalized so that arbitrary Joy expressions were
permitted on the right of the "--"; then, for instance,
"stack(ab -- b a foo a)" would be the program that takes two
stack items, swaps them, and then runs the program "foo",
but leaving an occurence of "a" on the stack;
it could also be written as "dupd swap foo", I think.

One interesting question to ask about Joy would be,
"Could all uses of 'stack(--)' be eliminated using standard
Joy programs like dup, pop, swap, and dip?" ...
I'm guessing the answer is "yes"; I'd be curious to
know of a specific algorithm that could be used to carry
out such eliminations.

> The number of 'primitives' is still finite, but is now not bounded by the
> ones I choose (plus, I don't have to think of names).  

I'm not sure what you mean by this. There are an infinite number
of expressions of the form "stack(...--...)", so you can't
take them all as primitives (in the ordinary sense of "primitive").
Of course, you could just take "stack(--)" as a new special
construct in addition to Joy's concatenation and quotation.

> Anyhow, you have an interesting approach to 'primitives'; the way I see it,
> each different type of primitive is a seperate category; I don't group them
> all simply by calling them 'primitives'.  

It is true; most systems would need quite a few primitives, which
one might want to group into categories when specifying the system.

> Joy's primitives are all functions
> (or all combinators, depending on how you view it; they're really the same
> thing), and everything you define is also a function (and a combinator).

Usually a distinction is made between a "function" and
a "combinator": a function is a mapping from parameters to results,
while a combinator is a special kind of function, a function
that can be expressed using only closed lambda expressions
(or, in Joy-style systems, a program that can be expressed
using a "stack(--)" expression with nothing occuring on the
right side except variables bound on the left; for instance,
"dup", "swap", "pop", "rot"). The precise meaning of "combinator"
may vary a bit among uses, but calling all functions "combinators"
is getting a bit nonstandard.

> Essentially, there's no distinction between primitive words and user words;

Also in my system, I don't plan to make a sharp distinction between
words that are "native" to the system (if there are such words)
and words that have been defined meaning by the user. In general,
I've meant to call all words "primitives" (by "words", meaning
expressions that have no gramattical structure in the system),
although I guess maybe I've deviated a bit from this terminology...

> the only distinction which exists in the system is the distinction between
> syntax and words.  Joy has a lot of syntax; I personally suspect that I will
> keep the quotation syntax, and I'll add the concept of immediate execution
> and postponement, so that the coder can write words which act to modify the
> compilation process, and then I'll throw in user-definable lexing (as shown
> by the stack shufflers above).

I'm not sure what you mean by "immediate execution" and "postponement"...

> > Third, definitions... Actually, I don't recall using any sort
> > of definition construct in my language, but it is something
> > that will be needed eventually... In any case, I don't
> > think a special construct is needed for definitions.
> 
> It actually is special -- [...]
> 
> But I'm getting ahead of myself.
> 
> In this case, the special action of the Define primitive is modification of
> the system dictionary, a dictionary which is specially distinguished because
> all word lookups start with it, even though it has no name of its own.

This is true. The system dictionary(s) would be a bit special
in that they will need to be referred to by the system whenever
it needs to interpret something the user has written. However,
the fact that there may be more than one system dictionary
(for different users) tends to make them seem less special.
But anyhow, that definitions and system dictionary were "not special" was
not my main point; my main point (which you have not contradicted)
was that a special language construct is not needed for definitions.

> -- you're disregarding the fact that any or all of
> your 'primitives' could actually be 'special constructs'.  

I'm not quite sure what you mean by this... I usually do not call
expressions "primitives" if they have a grammatical structure
(i.e., if they are formed from parts that independently have meaning)... 

> You imply that by
> calling a language feature a "primitive" and giving it a name, you evade any
> complications it causes.

I don't recall saying that; but, it is true that replacing a fancy language
feature with a primitive or two can avoid some (but usually not all)
complications associated with the feature.

> Speaking of immediate execution, how would your system handle Forth-like
> metaprogramming?  Does the code have full access to all of the unparsed
> source?

This is an interesting question...

Probably yes... As an example, if a user has been working on writing
a complicated program, he would probably have given a name to
a field for the program's text. Conceivably, this name could
occur within the text itself, so that the program could refer
to its own text cleanly. 

Of course, this is not to say that users will need to write
complicated programs very often; in more cases than with
traditional systems hopefully, a user would be able to get what he
needs done by issuing a few commands to a shell, without resorting
to a full-screen editor, and without naming his "program".

On the other side of metaprogramming, languages in the system will
probably provide ways of referring to the systems own
low-level structures, including, for instance, procedures for
peeking and poking at the system's raw memory, or doing raw
I/O. Using these procedures, the system could change the
way it works (it could even change itself so that it did not
work at all).

It would be hoped that the system's activity could be formalized
well enough so that the system could safely change itself
with these procedures. 

> > Anyway, definitions, in a way, are not really that a fundamental
> > part of a system (at least, not the system I'm working on),
> > although they are quite a visible part. Actually, 
> 
> They actually are fundamental -- if they weren't the compiler wouldn't know
> how to look up the words you type.  As I mentioned, the System Dictionary is
> a distinguished element in language theory.

Yes, a system dictionary would be very important to a system's shell,
which needs it to look up the words users types, but the shell
isn't really that fundamental a part of the system; not nearly
so fundamental as, say, the system's "compiler", which
translates the system's high-level procedures (expressed
in an internal (nearly)-purely-applicative language) into
low-level instructions, or the system's memory manager...

Anyway, this isn't that important of a point (that definitions
aren't an fundamental part of the system; incidentally, it
may be argued that this sentence, that that point is not important,
is not that important either). I concede that definitions are
a fundamental part of languages and the system's shell(s), although
not of the system as a whole.

> > I'm really not sure what you mean by "function invocation" as
> > distinguished from "function application", and I don't know
> > what you mean by "run the function"...
> 
> > By "function", I usually mean an abstract mapping from things to other
> > things. The things that will be "ran" in the system will be
> > procedures, not generally functions.
> 
> This is not something you mentioned before.  So your language distinguishes
> between procedures and functions?  How?  Why?  Doesn't that have negative
> implications as far as proofs go?  Wouldn't it make metaprogramming harder?

This is quite a few questions... I'll see if I can explain...

In my system, I might like to talk about the function of squaring
(multiplying a number by itself); this is a function and
not a procedure. By "procedure", I mean a specific command
that something should be done, or the idea of that specific thing
getting done. In this sense, here are some examples of some procedures
(written in English):

  - Go home.
  - Reboot yourself.
  - Output "Hello World" to the user.

These are procedures and not functions (except perhaps in
the sense of a function taking another procedure as the next thing
to do, yielding a completion, if one thinks of procedures that way).

And, in this sense, the idea of "Output" (like the "printf" of "C")
is not itself a procedure, but a function that takes a string
and then yields a procedure.

I view this as a crisp view of functions and procedures; I view
the muddling of functions and procedures as evident in, say, the
"C" language to be very irritating. It makes it impossible to
refer to a procedure without the system actually carrying it
out.

Joy's setup is cleaner than C's of course; functions are emulated
using procedures that pop and push things off the stack. A procedure
could be talked about without being executed by using []s.

But, in a purely applicative system, this setup seems unnatural;
functions are primitive in the system (having connections with
the system's application construct) and do not need to be
emulated using procedures.

An example may help illustrate... In a C-like system, the expression
"square(7)" might represent a procedure to find the square of 7
(i.e., push the normal representation of such a number onto
the stack); in my system, it would represent the square of 7 itself,
or in other words, 49.

> Is this distinction a necessary element of applicative languages of the type
> you describe?  It's not present in any way in any currently existing
> concatenative language.

It seems like quite a natural distinction to make; I don't know
if it is necessary. Concatenative languages like Joy use procedures
that are like functions, in that take pop things off the stack
(like parameters), and push things back on. In the kind of language
I describe, however, it is not necessary to talk about "the stack",
and functions are thought of as abstract mappings (with a connection
to the system's application construct), rather than programs
that push and pop expressions off stacks.

> > I'm not really sure what you mean by each of those syntactic elements
> > and certainly don't see that they are required in all purely
> > applicative systems. I've given the syntax of a small purely 
> > applicative language above, with these being some expressions in it:
> 
> >   @BW
> >   @@BCC
> >   @B@WB
> >   K
> >   @@B@BW@@BC@BB
> 
> > How are your 5 syntactic elements manifest in this language?
> 
> 1) Currying is explicit.
> 2) Function delimitation is implied by a newline.
> 3) Name definition is not present (but you cover it later, and you don't
> deny that it's needed); note that in the examples you give, the functions
> are simply thrown away, no possibility of reuse.
> 4) Curry order modification is carried out by the absence of @ (as I
> mentioned, you've only made grouping implicit).
> 5) Parameter counts are needed to actually run/use any of your functions;
> your procedural sublanguage will need to understand them even if your
> applicative language doesn't recognise them.  Also, your language will have
> to keep count just in case the programmer tries to apply too many arguments
> to a function.
> 
> I'm seeing something I didn't notice up till now, though.  I seem to see
> that function delimitation isn't needed in either of our languages --
> grouping/quotation is sufficient (and grouping in your language carries out
> curry order modification).  This leaves your language with 4 outstanding
> features and mine with 3; the missing element is parameter counts.

I still do not quite get exactly what you mean by "parameter counts".

> In your language parameter counts are very interesting and complex; a
> parameter can be shunted through a _lot_ of combinators, some of them with a
> runtime action (such as the 'select' combinator); and the end result has to
> be known before the result can be run.  I'm not sure whether this means that
> runtime combinators are impossible for you; and if it does mean that, I'm
> not sure whether that denies you completeness or makes the programmer's job
> harder.

Not sure what you mean by "select" combinator or "runtime" combinators.

There is one issue that I think may be underlying a lot of your
comments... 

In a Joy system, it is easy for an interpreter to tell
what to do just by looking at an expression; for, an expression
will consist of a sequence of atomic programs which the interpreter
knows how to do (well, the sequence may also include quotations, which
the interpreter can execute easily just by pushing the structure
onto the stack).

In contrast, in the type of system I'm describing, it may not always
be easy for an interpreter to tell what to do just by looking
at an expression. An example may help... consider the expression

  put (intToString 37)

This expression would represent a procedure to output the string "17".
Without sugar it might be written:

  put (intToString (+ (* 3 ten) 7))

There is not a one clear way that an interpreter should proceed
in order to get this procedure executed. It might likely
recognize that it is an expression of the form "put x";
thus it would like to know "x" well enough that it could
actually output it. Yet, "intToString (+ (* 3 ten) 7)" is probably
not immediately meaningful to the interpreter; thus, it will
probably need to Rewrite it into a more Normal form.

Exactly how this "normalization" will take place may vary from
system to system. In more complicated examples, the issue
of evaluation order might come up... a non-toy system will
probably need to compile procedures in some way if they are used
frequently.

The point I should stress, I guess, is that a language (such as
the purely applicative language I've been describing) is only
a small part of the design of a computing system; the system will
also need facilities for actually carrying out procedures written
in the language; these facilities are quite important to
consider when designing the system, although I neglected
to say much about them; they may be a bit trickier to set up in a system
that uses a purely applicative language than in a system that
uses a concatenative one.

Also, a system might conceivably use more than one language.
My prototype system will probably use two languages initially:

  - one internal language that the system uses. An expression
    in this language will be a pointer to a C structure,
    with perhaps an "int" at the beginning indicating
    what primitive the expression represents, or a "0" if
    the expression is an application, in which case the
    following two fields (function and parameter) of the
    structure will be used.
  - one external langauge that the user uses. An expression
    in this language will be a sequence of characters.

Both languages will be nearly purely-applicative, and they
will probably be quite similar.

> > Combinators in my purely applicative language are treated as ordinary
> > primitives; they do not have special language support any more
> > than Joy has special support for "dip", "swap", "dup", and "pop".
> 
> Not true.  Combinators in your language change the order of execution; they
> completely change the emitted code.  You don't seem to see a distinction
> between primitives which drive the compiler and primitives which simply have
> runtime behavior.  All of Joy's primitives have only runtime behavior; most
> of your combinators require compile-time behavior.

Again, my language provides a way to express abstract functions,
procedures, and other things. "compile-time behavior" and
"runtime behavior" really have nothing to do with the language.
So, I reaffirm that combinator primitives are ordinary primitives
and do not have special language support; however, it is true
that they will probably need to be dealt with specially by an
interpreter or compiler.

> This is the problem I was hinting about earlier, by the way.  You define
> "primitive" such that it includes many very different words, including many
> words which are actually syntactic in their effect.

Not sure what you mean by this... perhaps you are referring to
the way I referred to the "S3" combinator (as "S3"), which
has a sort of structure (as one might also want to refer to "S4"
and so forth). In a pure system I would probably not call "S3" a
primitive, but would probably rather take as primitive the
series of S's. We might call this series "So", and then "S3" could
be referred to as "So 3" (the application of "So" to "3"; by
"series" we mean a function taking a natural number).

> So although Joy's primitives all include compiled support, none of them
> require compile-time behavior.  They all do only one job: juggle the stack
> at runtime.  

Well, some Joy primitives do other things than juggle the stack
(such as "put", "get", and "gc"), but I see your main point, that
every Joy primitive represents a program that can be easily
compiled, while the same is not true for my system.

> If someone invents a new combinator which nobody in the past
> ever imagined, it'll be a simple thing to add it to Joy (witness my little
> stack-shuffler, which essentially allows the trivial definition of new
> combinators)

"my little stack-shuffler", i.e., the dreaded lambda construct in disguise ;-)

> but to add it to a syntax-based language will be a headache.

I'm not sure what you mean by this. I guess by "syntax-based language"
you mean a non-concatenative one (for instance, my system); I don't
see how it would be such a headache to add a new combinator in
my type of system... all combinators can be constructed from just
a few (for instance, "B", "C", "W", and "K"), so if the system could
handle procedures written using those few combinators, then it could
handle procedures written using other ones, by just replacing
occurences of the new ones by their definitions in terms of the old ones. 

> > Interestingly, Joy has a special
> > construct for definitions, which could have been avoided...
> > For example, one might define "popd" as "[dip] pop" in Joy by:
> 
> >   popd == [dip] pop
> 
> > Yet this could have achieved without treating "==" specially,
> > if they had added a "define" primitive, in this sort of way:
> 
> >   "popd" [[dip] pop] define
> 
> > "define" could be treated as an atomic program (that has an effect
> > other than on the stack, like "put" and "get") that modifies
> > the interpreter's namespace... Of course, one would still like
> > to have "==" as sugar. This is the approach I plan on using for
> > definitions in my system.
> 
> But in Joy "==" is more than sugar -- it's a fundamental part of the
> notation.  The model of Joy is that functions take a single parameter (the
> stack) and return a single parameter (another stack); there's not supposed
> to be a special dictionary available to them; the dictionary is supposed to
> be immutable.  The '==' syntax is intended to make Joy programs look like
> proofs; notice how the language syntax is set up so that it's not possible
> to define words at arbitrary points?  That's to keep the fiction that the
> programmer never actually defines words, but rather states equalities.
> 
> The only odd part here is that while a real proof could have arbitrary text
> on either side of the equal sign, a Joy definition must always have a single
> word on the left side.  Thus, a real equality could say:
> 
>   dup drop == id
> 
> While in Joy you can only say
> 
>   id == dup drop

Yes, the "==" construct in Joy is really only a definition construct,
not a general equality.

> Hmm, that might be an interesting extension ;-).  

Yeah, there are several ways the identity combinator can be defined
in terms of other combinators, for instance

  I = WK
  I = SKK
  I = CKK

This probably doesn't make sense... It's not too important
though, so I won't elaborate unless you ask...

> Good one.  Thus I see that any definition in your language which includes a
> combinator without all of its 'inputs' applied is itself a combinator. 

Exactly...

> > Of course, the factorial function could be written more concisely
> > using a function like Joy's "primrec" as:
> 
> >   Primrec (= 0) (K 1) Times
> 
> > The parentheses could probably be omitted, the spacing giving
> > the structure:
> 
> >   Primrec Q0 K1 Times
> 
> That's not a good idea -- spacing has no intrinsic superiority over parens,
> but parens make the grouping patterns a lot more flexible.  Plus, using
> spacing would destroy the ability to use named combinators in those places.

Hmm... well, I'll avoid using that kind of notation then...

> BTW, is the following correct?
> 
> @@@Primrec @= 0 @K 1 Times

Yes... 

> > Well, that depends on what precisely you mean by "simpler".
> > A purely applicative approach is simpler in the sense that
> > the only constructs needed are primitives and function-application,
> > while the Joy approach needs at least primitives, concatenation,
> > and quotation...
> 
> You're ignoring your 'default' action, which is grouping -- what your
> language does when there's no @ sign.  At the same time you're paying
> attention to my default action, concatenation.

I'm not sure what you mean by "grouping" in a system based on "@".
There is no "grouping" construct... The only construct is application.
there is no default, no action. this is only a language. but, maybe I've
misunderstood you.

> > You seem hesitant to accept that quotation is a real 
> > construct of Joy...
> 
> Why do you say that?  I've listed it as such in every list I've given.  The
> only time I acted suprised was when you claimed that it was "the
> fundamental" construct of Joy.  It's clear that if any single concept has to
> be identified with Joy, it's concatenation.

I guess maybe I misinterpreted what you said... anyway, it is true that
concatenation is Joy's most fundamental construct. One I said
earlier was motivated by the thought that Joy's quotation and
concatenation constructs could be replaced by a single construct,
a list construct. This was the construct which I mentioned was
most fundamental to Joy, although that did not make sense,
because the author did not formalize Joy using a list construct
at all.

> > In Joy, as the authors have formalized it, expressions are built
> > from primitives using two constructs: concatenation and quotation.
> > In purely applicative systems, only application is needed.
> > Parentheses often play the role of grouping that quotation does
> > in Joy, but parentheses can be eliminated as I've shown above.
> 
> Very good point; I hadn't noticed that grouping was analogous to quotation.
> 
> Of course, this is as much a statement about Joy as a statement about purely
> associative languages.  It's clear that making concatenation explicit in a
> purely concatenative language would also result in allowing quotation to be
> implicit.  
> 
> In your applicative language, you implicitly group and explicitly curry.  In
> other applicative languages they explicitly group and implicitly curry.
> Thus application requires grouping.  In Joy a programmer implicitly
> concatenates and explicitly quotes; making concatenation explicit would
> allow grouping to be implicit.
> 
> As an example, the Y combinator would (possibly) be written thusly in an
> explicitly concatentive dialect of Joy:
> 
>  dup.cons swap.concat.dup.cons.i;
> 
> I haven't worked out whether more complicated forms also work this way;
> would have to do some thinking, and I don't feel like putting effort into
> something so completely silly ;-).  I suspect that I'd have to use a prefix
> operator rather than an infix one.

I don't think it would work. How would you represent "[]" or "[dup]"?
But, then again, it would be completely silly, so it doesn't really
matter.

> It's interesting to note that by and large, quotation is rare in Joy (and
> Forth), but grouping is very common in the applicative languages I've seen.
> This makes the choice of which operation to make implicit easy in a
> concatenative language, whereas it's much harder with an applicative
> language.  In some problems grouping is common enough to make your @
> notation look quite attractive; in other problems, implicit currying is
> easier.

Yes... this is all about external syntax, of course. Internally,
a system would probably use a structure of two pointers (to other
structures) to represent application.

> Of course, there are counterexamples for every so-called general rule; have
> I simply mistaken the counterexamples for the rule?

I'm not sure... I would guess that the approach with "implicit currying"
would probably turn out to be the most convenient, especially
if parentheses were supplemented with separators like ";" and "," so
that

  f; g, h x; z

is shorthand for

  f (g (h x)) z

Of course, internally, a system will probably not use either "@" or
implicit currying, but a binary tree of pointers.

> > > Furthermore, having looked at the proofs written in the Joy 
> > > manuals, can you
> > > duplicate any of them as simply?
> 
> > I believe so... I'm not sure exactly which proofs you're 
> > talking about...
> > One proof that stands out was the "naturality of 'cons'". Here was the
> > proof they gave in the manual (in "The Algebra of Joy"):
> 
> As you note, that's a poor proof on their part -- not a very good choice,
> unless your point was that the author of Joy has made mistakes.

That was not my intent... :-)

> > Anyway, both the proof I just gave and the Joy proof are informal
> > proofs about systems...
> 
> All proofs are informal from some point of view.  You just have to match
> your assumptions up with your audience.  At least, that's what my books say.

Well, there are such things as formal proof systems, with rigorous
syntactic axioms and inference rules. By saying that my proof was
"informal", I meant to ensure you knew that I was not using such
a system (because it could almost look like I was, the way I presented it),
so that you would not pick at my use of "..." or anything :-)

(of course, what you said, that "proofs are informal from some point
of view", I'll agree with. "Formal proofs" that result from textual inference
systems are not really "proofs" at all in the psychological sense
of the word that I think you were using).

> I postulate that the purely applicative system is equivalent to the
> concatenative system.  I won't try to prove that, formally or otherwise.

I also suspect that the two approachs may be equivalent, meaning
that every Joy-style concatenative system has an associated
purely applicative system with analagous primitives and vice versa;
I'm probably not familiar enough with Joy to figure the exact
relationship out... it is surely more complex than just
"flip the expression around backwards and change []s to ()s".
this does not quite cut it... 

Joy requires []s in cases where parentheses are not necessary in
purely-applicative systems. For instance, "[foo] y" might
could be written as just "Y foo". Joy seems to require functions to
be quoted when they are used as parameters to other functions,
while this is not the case in purely-applicative systems.

On the other hand, purely-applicative systems require parentheses
in some cases that Joy does not need []s. For instance, the
expression "+ (* 2 3) (* 3 4)" can be written in Joy as
"4 3 * 3 2 * +".

> Thanks to your explanation I've clarified and refined my views.  One of the
> reasons I thought concatenative logic was better was that I didn't know that
> applicative logic could handle unspecified variables; it's clear now that it
> can.  It merely does so differently, by looking forward in the text of the
> function rather than back into the data of the program.

Similarly, before I heard of Joy, I hadn't realized that a imperative
stack-based approach could entirely eliminate variables.

Also, thanks for your comments... I've come to see Joy and similar
systems as worth considering (more than as just above-average
imperative flukes).

> At this point I'm going to go out on a limb again and try to make a
> distinction between the two systems: 
> 
> The applicative system requires full information about the complete
> "program" (I define a program as a complete set of functions and data) in
> order to run any part of it.

Not really... a purely-applicative system could conceivably execute
the first part of a procedure without examining the trailing part;
I plan on making my prototype system work this way. It may
need to _parse_ the whole textual program before beginning execution,
but it will not need to normalize the whole thing.

> The concatenative system does not require this much information; all we have
> to know is the portion of the program prior to the part you wish to execute.
> 
> In other words, the applicative system is in line with the ideals of Tunes.
> The concatenative one doesn't contradict Tunes, but does allow execution
> without complete understanding.

It seems to me that both systems could be used to execute programs
without complete understanding of them.

> An applicative system only makes sense as a declarative system.  A
> concatenative system also makes sense as an imperative one.

I'm not sure what precisely you mean by "declarative" and "imperative",
but this may be an apt description. An applicative system like
the one I've described can be used to represent all kinds of
things, including numbers, functions, and procedures. A Joy-style
system can only be used to represent procedures, but procedures
that push expressions representing, say, numbers onto the stack
can be used to emulate numbers.

> I think this last feature is what attracts me to concatenative logic.
> Modern computers are by their nature imperative, and many of the problems
> solvable by them are well-described by imperative processes ("programs").

I think you have again touched on one of Joy-style systems' best
advantages: they are more straightforwardly executed on
modern machines. I do not doubt this; but, I think programs
expressed in purely-applicative languages may be easier to
reason about (reasons for this will be given at the end of
the post). With a nice metasystem, however, I suspect that programs
expressed with a purely-applicative language could be executed
just as efficiently as Joy-style programs. Likewise, with
a bit of overhead, I suspect that one could reason well about Joy-style
programs.

So, the question of which system is superior is a matter of taste...
Joy programs are more straightforwardly executed but less
straightforwardly understood. But, maybe this is a naive conclusion.

> > This would have been a bit trickier, though, and I'm not
> > really too familiar with the combinatory axioms, so I didn't
> > attempt it. To give you a flavor of how this might work though,
> > here are a couple combinatory axioms:
> 
> >   B(BB)B = C(BB(BBB))B
> 
> It's going to take me a LONG time to see that as more than noise.  (Not a
> problem -- consider that most concatenative systems have "stack noise".)
> 
> >   BCC = I
> 
> This one makes sense, once I read below to recall what the single-letters
> stand for.  (BTW, is it really _that_ important to use only a single letter
> for combinators?  If so, there's a REAL advantage that concatenation
> posesses over application. ;-)
> 
>   converse converse == id.
>   (boolean) not not == id.
> 
> > The first one states the associativity of composition;
> 
> Ah.  In that case, would it be true that:
> 
>  concat concat == stack:(abc--cab) concat swap concat;
>
> expresses the same thought in a concatenative form? 

Yes... this states the associativity of "concat". This is similar to
composition, I suppose.

> I find that to be easier to read; is that just me?  

Well... You have cheated by using "stack:(abc--cab)".
(although I see your note below now that Joy has a primitive for it).

The combinatory axiom could be made more readable if one
used combinators other than "B" and "C" to express it;
for instance,

  foldl 2 B = foldr 2 B

expresses the same thought, where "foldl" and "foldr" are a certain
series of combinators, such that "foldl 2" and "foldr 2" have these
properties:

  foldl 2 f x y z = f(fxy)z
  foldr 2 f x y z = fx(fyz)

> > for, from it it follows that
> 
> >   B(BB)B F G H = C(BB(BBB))B F G H
> >   BB BF G H = BB(BBB)F B G H
> >   B(BFG)H = B(BBBF) B G H
> >   B(BFG)H = BBBF BG H
> >   B(BFG)H = B BF BG H
> >   B(BFG)H = BF(BGH)
> 
> Sorry, I'm lost.  I don't know most of the combinators, certainly not by
> their ISO standard one-letter names.  

Hmm... perhaps my spacing was confusing... anyway, the only combinators
I used were "B" and "C":

  B's rewrite rule: Bfgx = f(gx)
  C's rewrite rule: Cfxy = fyx

The "F", "G", and "H" were variables... I'll use "x", "y", and "z"
this time... Remember that "B" acts as a function that splits
a functions first parameter, so to speak, into two (function
and parameter). Applied to two functions, it yields their composition.
"C" takes a function's converse.

First see that "B(BB)B" is a function that takes three parameters,
"x", "y", and "z", yielding "(x . y) . z":

  B(BB)Bxyz = (B (B B) B x) y z  [just introduced extra ()s and spaces]
               ^

            = ((B B) (B x)) y z  [by B's rewrite rule, on the "^"ed "B"]

            = (B B (B x)) y z    [removed extra ()s]
           
            = B B (B x) y z      [removed more extra ()s]

            = (B B (B x) y) z    [added extra ()s]
               ^
  
            = (B (B x y)) z      [by B's rewrite rule, on the "^"ed "B"]
          
            = B (B x y) z        [removed extra ()s]
            
            = (x . y) . z
 

Now see that "C(BB(BBB))B" is a function that takes three parameters,
"x", "y", and "z", yielding "x . (y . z)":

  C(BB(BBB))Bxyz = (C (B B (B B B)) B x) y z  [added extra ()s and spaces]
                   ^
                
                = (B B (B B B) x B) B y z       [C's rewrite rule]
                   ^                

                = (B (B B B x) B) y z           [B's rewrite rule]
                   ^       

                = B B B x (B y) z               [B's rewrite rule]
                  ^

                = B (B x) (B y) z               [B's rewrite rule]
                  ^
                
                = B x (B y z)                   [B's rewrite rule]

                = x . (y . z)

Now do you see how from "B(BB)B = C(BB(BBB))B" it follows
that "(x.y).z = x.(y.z)" for all "x", "y", and "z"?

> > These two axioms are not particularly special, but with a
> > handful of ones like them, a sort of completeness follows;
> > it would be possible to directly show that
> 
> This completeness is what you're talking about when you refer to a "finite"
> number of primitives, right?

I'm not sure how this completeness has to do with
"finite number of primitives", except that the completeness could
be established with a finite number of axioms.

The precise completeness I'm talking about is the so-called
"principle of extensionality" which says that if "Fx = Gx" is
provable for an arbitary "x" in a system, then "F = G" is also
provable; in other words, if two functions behave the same way,
then they are the same.

It would be possible to explicitly postulate the principle of
extensionality, of course. Using it, one could show that
"f.(g.h)" is the same as "(f.g).h" by arguing that both,
when applied to an "x", give "f(g(hx))".

> > > I've recited already the five syntactic elements that every 
> > > applicative
> > > language needs in order to produce code; I've also recited the three
> > > elements that a concatenative language needs.  I've also 
> > > made the claim that
> > > your language needs one additional element (combinators) 
> > > which is distinct
> > > from any of the other five needed by a truly pure 
> > > applicative language.
> 
> > > Can you make any counterclaims?
> 
> > This was mentioned above... I don't see that those five
> > "syntactic elements" are essential in every applicative language
> > or how they are evident in the one I've presented...
> 
> They were certainly present in the language you _had_ presented.  Four of
> them are still present in the language you've created for this post;
> Concatenative languages still only need three.

Let's see... here were the four you listed:

> 1) Currying is explicit.
> 3) Name definition is not present (but you cover it later, and you don't
> deny that it's needed); note that in the examples you give, the functions
> are simply thrown away, no possibility of reuse.
> 4) Curry order modification is carried out by the absence of @ (as I
> mentioned, you've only made grouping implicit).
> 5) Parameter counts are needed to actually run/use any of your functions;

This does not really make sense to me. These seem to be just
observations about my system, rather than "syntactic elements".
In particular, how can #3, "Name definition is _not_ present"
(emphasis added) be a syntactic element?

> > And, combinators are not particularly special elements in
> > the language I've given, but ordinary primitives.
> 
> But not all of your 'ordinary primitives' are especially ordinary.  It seems
> to me that you're bundling a lot of very different functions under the name
> of 'primitives', and claiming that because you can call them by a single
> name they're therefore simple.

I do bundle a lot of very different functions under the name of
'primitives', but I have not claimed that they are simple just because
they can be called by a simple names. Although I usually aim to take
only fairly simple things as primitive, it would be possible
to take complicated things as primitives (Fare calls this
"abstraction inversion").

Anyway, what I said (and which you have not contradicted) was that
combinators "are not particularly special elements in the language",
meaning that that the language does not have complicated constructs
for referring to them. However, even though the language does
not have special constructs for dealing with them, they might
be treated quite specially by the execution system; in this
sense, as you have said, they are not so "ordinary".

> It seems to me that the compiler will have to be specially written in order
> to merely use many of the combinators.

Exactly. Of course, the Joy interpreter also needed special support
for its combinators "swap", "dup", "dip", and such. As you've commented,
the only support it needed to give them is the ability to execute
them (as atomic programs). My system might require more elaborate
support for the analagous combinators; but, then again, perhaps
it won't.

> Here's two reasons to prefer a concatenative system (this adds to your
> reasons):
> 
> 3)  Translation to machine code is simpler than translation of an
> applicative system.  This is minor; Tunes doesn't want to use something
> simply because it's simpler.  I personally do want that, but I'm influenced
> by Chuck Moore here.

This is a good point, and may be a true advantage of Joy-style systems.
But, I suspect that there may be clever ways to compile procedures
written as purely-applicative expressions.

> 4)  The most commonly used metaprogramming systems currently are
> concatenative, although almost no languages now are concatenative.  I'm
> talking about Postscript -- probably the most commonly used language (since
> it's used almost every time people print), and almost never used in any way
> other than metaprogrammed/machine generated.  Thus, practical experience
> seems to indicate that people writing metaprograms are comfortable using
> concatenative languages.

Well... I don't really know much about Postscript, but I don't
see how purely-applicative systems would have particular difficulty
with metaprogramming. Of course, I don't know of any practical
purely-applicative systems with flexible metaprogramming,
since I don't know of any practical purely-applicative systems
at all.

> > So, what do you think of purely applicative systems now?
> 
> I'm happier with your ideas than I was before -- but I'm still skeptical.
> You see, all the languages developed thus far (with the exception of Forth
> and Postscript) have been applicative languages, and I don't really like the
> results.
> 
> But don't let my skepticism slow you -- one problem I have is an
> unwillingness to work on something which isn't new to me.  Innovation is my
> goal, but it's not the only good goal -- you're working on refinement, and
> that's also a noble goal.

I guess it could be called "refinement", although there aren't really
any particular systems that I'm starting with to refine. Anyway,
this kind of system does seem like an interesting thing to
work on, whether it is new or not...

> > - "iepos" (Brent Kerby)
> 
> -Billy

Let's see... I mentioned that I'd give some reasons why programs
written as purely-applicative expressions might be easier to
reason about than similar ones written in a Joy-style system,
so now I'll give a couple:

  1) In a purely-applicative system of the kind I've described,
     it is always possible to replace a sub-expression with
     one of equivalent meaning, while this is not the case
     in the Joy system, because of Joy's heavy dependence on
     a quotation construct. To illustrate, suppose in Joy one
     had a program

       [0 +] foo

     where "[0 +]" is used as a quoted program by "foo". Upon
     realizing that "0 +" is a program that doesn't do anything
     and is the same as "id", one might like to simplify the previous
     program to

       [] foo

     However, this change cannot be made without restriction;
     one much first check to make sure that "foo" does not
     use the program as a list (if it does, then the change may
     alter the behavior of the program). 

     Such a complication does not occur in an analagous
     purely-applicative system. The procedure

       foo (plus 0)

     could be freely rewritten as

       foo I

     regardless of the structure of "foo", if the system had
     realized that the expression "plus 0" means the same thing as "I".

     This problem with Joy was mentioned in the manual
     (in the section "A rewriting system for Joy"); they presented
     some possible remedies, but none which they really liked.
     Anyhow, this is not a fatal problem of Joy, only
     a slight impediment to reasoning about programs (since
     often in reasoning about programs, one might like to
     rewrite a program by replacing a sub-expression with
     one that is known to have equivalent meaning, but this
     does not always preserve the meaning of the whole program
     in Joy). One thing to note is that such replacement
     works fine as long as the sub-expression to replace
     is not enclosed by []s. For instance, it would be
     okay to replace
    
       2 3 + foo
   
     by
   
       5 foo

     regardless of the structure of "foo".

  2) Many fundamental equivalences of Joy programs that one
     might like to show require restrictions, while
     similar equivalences of purely-applicative expressions
     can be taken unrestrained. Specifically, one might
     like to think upon seeing a Joy program of the form

       x y swap

     that it could be rewritten

       y x

     But, this will not hold, for instance, if "x" is "2" and
     "y" is "sign". For

       2 sign swap

     a program that inserts "1" under the top item of the stack,
     while

       sign 2

     is a program that takes a number of the stack, replaces
     it with its sign, and then pushes "2".

     The restriction that one must take (and they mentioned this
     in the manual also) is that 

       x y swap

     can only be rewritten as "y x" if "x" and "y" are programs
     that push one item onto the stack but have no other effects.
     In the last example, "sign" does not satisfy this, since
     it also pops an item off the stack.

     That this restriction is necessary seems to be quite a
     stumbling block for systems reasoning on Joy programs;
     it is not always immediately visible, upon looking at
     a program, how many items it pushes onto the stack,
     and whether it has any other effects.

     The analagous equivalence in a purely-applicative system
     is that

       @@@C f x y

     can be rewritten as
 
       @@f y x

     This can be accepted without restraint for all expressions
     "f", "x", and "y" of the system.

     A similar problem occurs with the "dup" and "pop" of Joy,
     but not with the analagous applicative combinators "W" and "K".

These two problems are fairly serious, it seems, and make it more difficult
for reasoning systems (and people as well, I would suppose) to
understand Joy programs. They both seem to be fairly fundamental
to the imperative approach, and cannot, to my knowledge, be
easily remedied.

However, similar purely-applicative systems seem to be free of
these problems... Of course, purely-applicative systems do have
problems of their own, particularly in the implementation,
as was mentioned earlier...

But, overall, the purely applicative approach still seems to me
to be the best way to found a system. But, if you can see
any awful problems with the approach, I'd still like to
hear, of course.

Of course, just saying that my approach is to use a purely
applicative language based on combinators does not really
say too much about how the system will work; there are
many other important aspects of how a system might work
which I've not really thought much about.

(at this point, I'm going to begin rambling on some
thoughts about *implementing* the system)

One particular issue which needs to be decided is:
"How will expressions be normalized so that the interpreter
can understand them?"

In my prototype system, I generally plan to represent expressions
using a binary tree; the nodes will be scattered all over memory,
and the parent nodes (i.e., expressions representing an application) will
have pointers to the two children.

Before it can understand an expression, the interpreter will
in almost all cases need to rewrite it so that there are no
combinator redexes (i.e., an expression with a combinator applied to enough
parameters that a "beta-reduction" is possible) within.
So, one thing that the interpreter might do first when
interpreting an expression and reduce all its combinator redexes
(i.e., change things like "Bfgx" into "f(gx)"), at least the
ones that are at the head of the expression; sometimes, this process
may never end; if this happens, it is the programmer's fault. (:-))

However, sometimes this kind of rewrite is not enough. For instance,
if the interpreter sees an expression like "7 + 7", it would
probably not immediately understand it, even though it contains
no combinator redexes, and would need to normalize it to "4 + ten",
or more likely to something like "2^1 + (2^(1+1)) + (2^(1+1+1))".

This next thing was mentioned, I think, in my original post...
I've planned having the prototype system use ncurses for its
output and input, probably IRC style (with a one-line buffer
at the bottom for input). However, rather than build in
the stuff for input and output, I thought it would be an
interesting test for the system to provide the high-level
system with a few raw primitives for input and output and
see if it could manage its own buffering, parsing, scrolling,
and such.

And of course, the first hurdle I've come to (and this
is rather silly) is in making the interpreter execute the
raw primitive for writing a character to the screen, which
requires three parameters (two integer coordinates, and an
integer telling what character to place). The interpreter
needs to be able to take the three expressions representing
the numbers and extract 32-bit ints from them, so that it
can actually put the character on the screen.

This really would not be such a difficult task, if the numbers
were formed from, say, just "0" and "1", and addition,
subtraction, multiplication, division, and exponentiation.
It would be possible to rewrite each number expression into
a base 2 format without too much trouble (one requirement
is that the result be correct, even if an intermediate
result might have overflowed 32 bits; so, the intermediate
results will generally need to be stored in a more flexible
form than "int", and more than int additions and multiplication
will be needed).

I still have not implemented that yet. But, this has started
me thinking about some other issues... In the future, I'd
like for the system to be able to handle negative numbers
and fractional numbers (and possibly "imaginary").
On a high level, it should be possible to refer to these numbers
in a uniform way; i.e., the functions "+", "*", should
be meaningful when applied to all kinds of numbers, regardless of
how they are represented. And, it would be nice if numbers
like the square root of two and pi could be precisely represented.

One important thing about a system of normal form for numbers
is that one must be able to easily tell which of two numbers
are larger (or if they are same), just by looking at the
normal representation of each number.

Here are a couple ways one might represent numbers:

  1) Using a floating-point type representation. The advantage
     of this is, of course, that one would be able to do
     rapid calculations using them since most processors have
     special support.

     The downside is that there is the possibility of Error.
     Error may be acceptable in some cases in my system, as
     long as it can be precisely formalized and proven
     not to exceed a certain level.
     
  2) A rational number could be represented precisely using
     a bit for sign and two integers, one for numerator and
     the other for divisor (the two integers being represented
     in some other format, probably the ordinary base 2).
 
     The downside (other than that it would be a bit slow) is that
     it is still not possible to precisely represent numbers
     like the square root of two and pi.

Of course, there are probably other better ways of doing it
that I don't know of. One thing that I've considered (this
was mentioned in my original post) was using the
Church Numerals (a very interesting class of combinators that
are very similar to numbers; let me know if you are not familiar
with them), but with an extension for negative numbers.
In fact, I'm pretty sure that the church numerals would be
nice to have at a high level; but I'm not sure that it
will be any easier to invent a normal-form system for church numerals
than for numbers in the ordinary sense. But, then again,
maybe using church numbers would give some insight...

Another aspect of normalization... the system, given two
expressions, generally needs to be able to determine if
the denote the same thing. In particular, the system
will have an equality primitive "Q", and it needs to
be able to reduce expressions like "Q x y" to a normal
form, namely to an expression like "True" or an expression
like "False" (actually, the system will probably use
the combinators "K" and "KI" in place of propositions;
these combinators have the property that "(K)xy" is "x"
and "(KI)xy" is "y"; thus, they are their own if-then-else
construct, so to speak).

The obvious approach to normalizing "Q x y" to a proposition
is to normalize "x" and then normalize "y", and then see
if they have the same structure (if they do, then the
whole thing normalizes to "True", otherwise, to "False").
This approach will only work if the system always normalizes
expressions to the same normal form if they denote the same thing.

Anyway, there would probably be use in some cases in the
system keeping track of more than one name for the same thing
(for instance, with numbers, some calculations may be easier
if the number is in one form than if it is in another),
although there does also seem to be some use in the
system assigning every expression an associated
"normal" expression (denoting the same thing), for the
purpose of equality-testing...

The issue of how the system will normalize things seems
to be quite a complicated issue, and one that will grow
in complexity as the system's language grows in
complexity. Thus, it might be a good thing if the
normalization system was a bit flexible...

Anyway, this post has reached an absurd length...
If you reply, feel free of course not to reply to
every little comment, or don't reply to two comments
in different places that say the same thing
(I think I may have said the same things several times :-))

- "iepos" (Brent Kerby)