Type-checking, GC, Forth, rewriting.. (was: garbage collected forths)

Thu Apr 28 22:13:39 PDT 2005

Hendrik Boom wrote:

> [..]
> > - Joy, http://cliki.tunes.org/Joy
> >    somewhat the "pure" concatenative language, much more
> >    experimental. GCed.
>
> Ugh Rewriting is slow.

Well, I'm not familiar with its implementation (an interpreter
written in C) but consider that:

a) Joy semantics is about pure functions from stack to stack;
   rewriting rules are used only to clarify this semantics
   and to reason about programs; there is nothing in the
   language which prevent use of Forth-like implementation
   techniques, both for interpretation - with various
   threaded code techniques - that for compilation.

   See by Anton Ertl et al.:
   Interpreter Research at Institut für Computersprachen
     http://www.complang.tuwien.ac.at/projects/interpreters.html
   Forth Research at Institut für Computersprachen
     http://www.complang.tuwien.ac.at/projects/forth.html
   Threaded Code,
     http://www.complang.tuwien.ac.at/forth/threaded-code.html
   Compilation of Stack-Based Languages,
     http://www.complang.tuwien.ac.at/projects/rafts.html

b) rewriting system implementations are no more so slow;
   for example, Maude's (http://cliki.tunes.org/Maude)
   authors claim (Maude v 2.1.1 Manual):
   http://maude.cs.uiuc.edu/maude2-manual/html/node12.html

     Performance

     [..]

     Generally and roughly speaking, the current Maude
     implementation can execute syntactic rewriting with typical
     speeds from half a million to several million rewrites per
     second, depending on the particular application and the given
     machine (the above rough figures assume a 900 Mhz Pentium).
     Similarly, associative and associative-commutative equational
     rewriting with term patterns used in practice can be
     performed at the typical rate of one to several hundred
     thousand rewrites per second.

Also I have some idea to speed-up rewriting, which I think are
confirmed by recent research (ask, if you are interested).

> > Tell me if you are interested in type checking stack-based
> > languages, I have some references to give you.
>
> Yes, please.  I consider strong typing to be essential
> for the programmer's sanity, even if no attempt is made
> to use it for greater run-time eficiency.

Various work by Jaanus Pöial (http://www.kodu.ee/%7Ejpoial/),
for example:

"Typing Tools for Typeless Stack Languages"
  http://www.kodu.ee/%7Ejpoial/teadus/icalp04.pdf
"Stack effect calculus with typed wildcards, polymorphism and
inheritance"
  http://www.kodu.ee/%7Ejpoial/teadus/jpef2002.txt
  http://www.kodu.ee/%7Ejpoial/teadus/jpef2002.ps
"Program Analysis for Stack Based Languages"
  http://www.kodu.ee/%7Ejpoial/teadus/ef2003new.pdf
"Forth and Formal Language Theory"
  http://www.kodu.ee/%7Ejpoial/teadus/euro94.ps
"Alternative Syntactic Methods for Defining Stack Based Languages"
  http://www.kodu.ee/%7Ejpoial/teadus/nw98-1.ps

An so on (see http://www.kodu.ee/%7Ejpoial/teadus/)

By Peter J. Knaggs
  http://dec.bournemouth.ac.uk/staff/pknaggs/papers/index.html
and Bill Stoddart
  http://www-scm.tees.ac.uk/users/w.j.stoddart/

"Type Inference in Stack Based Languages"
  http://citeseer.ist.psu.edu/stoddart93type.html
"Towards a Formal FORTH"
  http://citeseer.ist.psu.edu/knaggs93towards.html
"Practical and Theoretical Aspects of Forth Software Development"
Ph.D. of Knaggs (about types, see Ch. 6 and 7 in particular)
  http://dec.bournemouth.ac.uk/staff/pknaggs/thesis/index.html
Also see ch. 5 par. 6 for a technique to map stack items
to registers, more simple than that used by Anton Ertl.

By Robin Popplestone, author of POP-11:

"Specifying Types by Grammars: A Polymorphic Static Type Checker
Operating at a Stack Machine Level"
  http://citeseer.ist.psu.edu/73738.html

"A Type Operational Semantics Based on Grammatical Characterisation
of an Abstract Machine"
  http://citeseer.ist.psu.edu/popplestone94type.html

"Static Analysis of PostScript Code",
by  R. Nigel Horspool, Jan Vitek
  http://citeseer.ist.psu.edu/horspool92static.html

About type inference in Factor, see Slava Pestov's (its author)
Weblog:
"Type inference"
  http://www.jroller.com/page/slava/20041028#type_inference
"Type inference part 2"
  http://www.jroller.com/page/slava/20041106#type_inference_part_2

There was also strongForth, on-line here

  http://home.t-online.de/home/s.becher/forth/

which added strong static typing to Forth, but it seems
disappeared from the 'net. Here a copy at Internet Archive
Wayback Machine:

  http://web.archive.org/web/20021203052508/http://home.t-online.de/home/s.becher/forth/

I think it used a similar technique exposed below.

"Polymorphic Type Checking by Interpretation of Code",
by Stefan Kahrs
  http://citeseer.ist.psu.edu/kahrs92polymorphic.html

In this paper a (emulated) stack is used to type-check
Miranda programs by abstractly interpreting these programs
(the type-checker itself is written in Miranda, predecessor
of Haskell).

I think this is the same technique referred above
(stack effect calculus); but it suggests a *simple*
implementation that I will *sketch* below (I assume you are
familiar with Forth lexicon):

 1. type-checking is started before compilation (of a word)
    and the idea is to "abstract interpret" your word
    definition (or program fragment)

      : word w1 w2 ... wn ;

    to infer types and numbers of parameters on the stack
    (status of the stack before and after the run-time
    execution of a word);
 2. of course, user defined words are composed starting
    from other words (w1, w2, ..., wn), so you recurse until
    arrive to system defined words ("primitives");
 3. you need to annotate each *primitive* word with its
    signature: how do you do this?
 4. There is in the system a type checker/inferer
    *vocabulary*; so, you switch to this vocabulary and
    start interpreting your word definition;
 5. in it each primitive has a "directly executable"
    signature for its parameters: invoking it, a primitive
    word consume type ids (constants) from a stack of types
    (which, with caution, is the same as the normal stack)
    and left some other type ids on the stack of types;
 6. where is the check? When a word consumes its parameters
    on the type stack, it checks that they are *compatible*§
    with what it expects; if the translator (interpreter
    /compiler) require explit type annotations for each new
    user defined word, then the type checker compare these
    type annotations to what it inferred; otherwise inferred
    types are shown to the user.

§ More complex the compatibility rules are, more "precise" is
  the type system (it is able to identify a more strict
  subset of allowed run-time values).

Note that in papers about the stack effect calculus,
type variables are introduced to allow polymorphic
words; I question their utility: I guess that one can
achieve polymorphism manipulating the stack of types
(without complexity, most of the time).

I have no formal proof of this, but I think that it's trivially
true: during checking/inference, you compute types; now,
if a pure concatenative (stack-based) language as Joy is
Turing-equivalent, then you can compute whatever types you
like without using type variables (note that so doing
without restriction, you have a type system that is not
decidable).

For example, consider the polymorphic word SWAP: it exchange
the top 2 elements on the stack, of whatever types they
are; expressed with a "stack picture"

  SWAP (e1 e2 -- e2 e1)

where the notation (before -- after) describes the *data* stack
before and after the excution of SWAP; e2 is Top Of Stack (TOS)
before and e1 is TOS after,.

Now, let's look at effect of SWAP on the stack of *types*:

  SWAP (t1 t2 -- t2 t1)

where t1, t2 are type ids; they are exactly the same
(modulo the variables introduced by the notation).

But not only the stack prictures are the same, the
"interpretation semantics" of SWAP and what I will call
the "abstract interpretation semantics (for type-checking)"
of SWAP are the same; neither need the type-checking part
(during abstract interpretation, SWAP only exchange the
top 2 type ids).

So, it is convenient to implement the stack of types
by the usual data stack.

Not related to type checking, but probably interesting for you:
"Generational Stack Collection and Profile-Driven Pretenuring",
by Perry Cheng, Robert Harper, Peter Lee
  http://citeseer.ist.psu.edu/cheng98generational.html
About GC and stacks.

But I think this is even more interesting, about GC:
"Compile-time garbage collection for the declarative language
Mercury", Ph.D. of Nancy Mazur
  http://www.cs.mu.oz.au/research/mercury/information/papers.html#mazur-thesis

> > > If so, it could be an ideal platform to which to port some
> > > of the work I'm doing on program transformation and verification.
> > >
> > > -- hendrik
> >
> > Interesting. Do you have a web page about your work?
> >
>
> No.  I should construct one.  Let me try to remember to write
> you when I have.  Or remind me if I forget.

Well, I can check periodically the web site of your company
if you intend to post your work there.
(It is at http://www.pooq.com/ , right?)

> I wrote a verifier using intuitionistic type theory about
> twenty years ago, on a conbination of hardware and software that's
> no longer available.  I'd like to redo it now that machines are a
> thousand times faster.
>
> But in the meantime, I've noticed that the notation in which
> coq (another ITT-based theorem proof-checker) archives its
> proofs and theories is a stack language.  And I've got the
> idea that it make sense to use Forth-like tools as intermediary
> metween high-level languages and machine code.

Well, if I understood your "intentions"(§), I suggest you to use
Threaded Code as intermediate data structures, with a concatenative
syntax as a simple human-readable representation; this choice has
the nice side effect that you have a simple, *fast* interpreter
of your intermediate representation, nearly for free (good for
bootstrapping and debugging).

§ pun intended, if you are *that* Hendrik Boom cited by Charles
  Simonyi in "Intentional Programming Innovation in the Legacy Age"
    http://citeseer.ist.psu.edu/simonyi96intentional.html)

Threaded Code is also similar to an Abstract Syntax Tree (AST);
so, the first time I encountered the concept of Slim Binaries

  http://www.ics.uci.edu/~franz/SlimBinaries.html

by Michael Franz

  http://www.ics.uci.edu/~franz/Site/publications.html

I immediately connected the two. Slim Binaries are, in substance,
*compressed* ASTs. A Slim Binary is obtained by a process of
1) identification of common code patterns in the program source
code, and 2) substitution of these common code patterns with a
reference to a single copy of each pattern (from multiple copies
to sharing of a same subtree in the AST).

In practice, Franz automated a process that a good Forth programmer
do manually: *factoring* (with the advantage that a Forth programmer
sees directly a much more manageable, succint program, compressed
in a meaningful way for *him/her*).

Slim Binary is a good migration format for programs: they
compile, at destination site, faster than textual source code
and migrate faster than pure binary code and source code (of a
language with a verbose syntax) because they are smaller of both.

It is also a good starting point for profile-directed incremental
(re-)compilation (but Franz probably ignored the possible connection
between Threaded Code and Slim Binaries: he cited as a cons of
Slim Binaries the fact that they are *not* immediately interpretable
at destination site, which is not true if they are disguised as
Threaded Code).

See "Adaptive Compression of Syntax Trees and Iterative Dynamic
Code Optimization: Two Basic Technologies for Mobile-Object Systems"
  http://www.ics.uci.edu/~franz/Site/pubs-pdf/BC01.pdf

Sorry, I'm digressing (so I refrain to speak about possible
compression techniques of source code).

> And somehow I think that all these ideas can come together
> into a coherent, usable system -- taking my language from
> twenty years ago, synthesising proofs of programs,  converting
> to low-level Forthish notation,  and checking these proofs
> while generating machine code.

In a way you are working toward a pratical implementation
of Proof-Carrying Code (http://raw.cs.berkeley.edu/pcc.html
see also George Necula home page http://www.cs.berkeley.edu/~necula/)

I think you need to speak with Slava Pestov: its Factor seems
a nice candidate. I don't know what he thinks about proofs
of programs and formal methods in general, but he seems
not allergic to theory (a "disease" so common these days).

> None of these things by themselves seems too big to tackle,
> except they've  gotten my mind tangled into one metarecursive
> lump.
>
> Yes, I plan to go all the way from formal theory to machine code.

This is a good news for me. Apparently there is no more interest
in formal methods in this project (Tunes). I hope someone will
promptly and strongly correct me.

Recently instead, there is a new effort to produce an Operating
System with formally verified security and correctness properties:

The Coyotos Secure Operating System

  http://www.coyotos.org/

It will written in BitC, their language, and so amenable
to proofs of program properties.

I'm curious: what will be the status of your code? Proprietary,
GPL, BSD, other license?

Regards.

P.S.: sorry for my horrible English.

> -- hendrik

--
Massimo Dentico