Type-checking, GC, Forth,
rewriting.. (was: garbage collected forths)
Massimo Dentico
m.dentico at virgilio.it
Thu Apr 28 22:13:39 PDT 2005
Hendrik Boom wrote:
> [..]
> > - Joy, http://cliki.tunes.org/Joy
> > somewhat the "pure" concatenative language, much more
> > experimental. GCed.
>
> Ugh Rewriting is slow.
Well, I'm not familiar with its implementation (an interpreter
written in C) but consider that:
a) Joy semantics is about pure functions from stack to stack;
rewriting rules are used only to clarify this semantics
and to reason about programs; there is nothing in the
language which prevent use of Forth-like implementation
techniques, both for interpretation - with various
threaded code techniques - that for compilation.
See by Anton Ertl et al.:
Interpreter Research at Institut für Computersprachen
http://www.complang.tuwien.ac.at/projects/interpreters.html
Forth Research at Institut für Computersprachen
http://www.complang.tuwien.ac.at/projects/forth.html
Threaded Code,
http://www.complang.tuwien.ac.at/forth/threaded-code.html
Compilation of Stack-Based Languages,
http://www.complang.tuwien.ac.at/projects/rafts.html
b) rewriting system implementations are no more so slow;
for example, Maude's (http://cliki.tunes.org/Maude)
authors claim (Maude v 2.1.1 Manual):
http://maude.cs.uiuc.edu/maude2-manual/html/node12.html
Performance
[..]
Generally and roughly speaking, the current Maude
implementation can execute syntactic rewriting with typical
speeds from half a million to several million rewrites per
second, depending on the particular application and the given
machine (the above rough figures assume a 900 Mhz Pentium).
Similarly, associative and associative-commutative equational
rewriting with term patterns used in practice can be
performed at the typical rate of one to several hundred
thousand rewrites per second.
Also I have some idea to speed-up rewriting, which I think are
confirmed by recent research (ask, if you are interested).
> > Tell me if you are interested in type checking stack-based
> > languages, I have some references to give you.
>
> Yes, please. I consider strong typing to be essential
> for the programmer's sanity, even if no attempt is made
> to use it for greater run-time eficiency.
Various work by Jaanus Pöial (http://www.kodu.ee/%7Ejpoial/),
for example:
"Typing Tools for Typeless Stack Languages"
http://www.kodu.ee/%7Ejpoial/teadus/icalp04.pdf
"Stack effect calculus with typed wildcards, polymorphism and
inheritance"
http://www.kodu.ee/%7Ejpoial/teadus/jpef2002.txt
http://www.kodu.ee/%7Ejpoial/teadus/jpef2002.ps
"Program Analysis for Stack Based Languages"
http://www.kodu.ee/%7Ejpoial/teadus/ef2003new.pdf
"Forth and Formal Language Theory"
http://www.kodu.ee/%7Ejpoial/teadus/euro94.ps
"Alternative Syntactic Methods for Defining Stack Based Languages"
http://www.kodu.ee/%7Ejpoial/teadus/nw98-1.ps
An so on (see http://www.kodu.ee/%7Ejpoial/teadus/)
By Peter J. Knaggs
http://dec.bournemouth.ac.uk/staff/pknaggs/papers/index.html
and Bill Stoddart
http://www-scm.tees.ac.uk/users/w.j.stoddart/
"Type Inference in Stack Based Languages"
http://citeseer.ist.psu.edu/stoddart93type.html
"Towards a Formal FORTH"
http://citeseer.ist.psu.edu/knaggs93towards.html
"Practical and Theoretical Aspects of Forth Software Development"
Ph.D. of Knaggs (about types, see Ch. 6 and 7 in particular)
http://dec.bournemouth.ac.uk/staff/pknaggs/thesis/index.html
Also see ch. 5 par. 6 for a technique to map stack items
to registers, more simple than that used by Anton Ertl.
By Robin Popplestone, author of POP-11:
"Specifying Types by Grammars: A Polymorphic Static Type Checker
Operating at a Stack Machine Level"
http://citeseer.ist.psu.edu/73738.html
"A Type Operational Semantics Based on Grammatical Characterisation
of an Abstract Machine"
http://citeseer.ist.psu.edu/popplestone94type.html
"Static Analysis of PostScript Code",
by R. Nigel Horspool, Jan Vitek
http://citeseer.ist.psu.edu/horspool92static.html
About type inference in Factor, see Slava Pestov's (its author)
Weblog:
"Type inference"
http://www.jroller.com/page/slava/20041028#type_inference
"Type inference part 2"
http://www.jroller.com/page/slava/20041106#type_inference_part_2
There was also strongForth, on-line here
http://home.t-online.de/home/s.becher/forth/
which added strong static typing to Forth, but it seems
disappeared from the 'net. Here a copy at Internet Archive
Wayback Machine:
http://web.archive.org/web/20021203052508/http://home.t-online.de/home/s.becher/forth/
I think it used a similar technique exposed below.
"Polymorphic Type Checking by Interpretation of Code",
by Stefan Kahrs
http://citeseer.ist.psu.edu/kahrs92polymorphic.html
In this paper a (emulated) stack is used to type-check
Miranda programs by abstractly interpreting these programs
(the type-checker itself is written in Miranda, predecessor
of Haskell).
I think this is the same technique referred above
(stack effect calculus); but it suggests a *simple*
implementation that I will *sketch* below (I assume you are
familiar with Forth lexicon):
1. type-checking is started before compilation (of a word)
and the idea is to "abstract interpret" your word
definition (or program fragment)
: word w1 w2 ... wn ;
to infer types and numbers of parameters on the stack
(status of the stack before and after the run-time
execution of a word);
2. of course, user defined words are composed starting
from other words (w1, w2, ..., wn), so you recurse until
arrive to system defined words ("primitives");
3. you need to annotate each *primitive* word with its
signature: how do you do this?
4. There is in the system a type checker/inferer
*vocabulary*; so, you switch to this vocabulary and
start interpreting your word definition;
5. in it each primitive has a "directly executable"
signature for its parameters: invoking it, a primitive
word consume type ids (constants) from a stack of types
(which, with caution, is the same as the normal stack)
and left some other type ids on the stack of types;
6. where is the check? When a word consumes its parameters
on the type stack, it checks that they are *compatible*§
with what it expects; if the translator (interpreter
/compiler) require explit type annotations for each new
user defined word, then the type checker compare these
type annotations to what it inferred; otherwise inferred
types are shown to the user.
§ More complex the compatibility rules are, more "precise" is
the type system (it is able to identify a more strict
subset of allowed run-time values).
Note that in papers about the stack effect calculus,
type variables are introduced to allow polymorphic
words; I question their utility: I guess that one can
achieve polymorphism manipulating the stack of types
(without complexity, most of the time).
I have no formal proof of this, but I think that it's trivially
true: during checking/inference, you compute types; now,
if a pure concatenative (stack-based) language as Joy is
Turing-equivalent, then you can compute whatever types you
like without using type variables (note that so doing
without restriction, you have a type system that is not
decidable).
For example, consider the polymorphic word SWAP: it exchange
the top 2 elements on the stack, of whatever types they
are; expressed with a "stack picture"
SWAP (e1 e2 -- e2 e1)
where the notation (before -- after) describes the *data* stack
before and after the excution of SWAP; e2 is Top Of Stack (TOS)
before and e1 is TOS after,.
Now, let's look at effect of SWAP on the stack of *types*:
SWAP (t1 t2 -- t2 t1)
where t1, t2 are type ids; they are exactly the same
(modulo the variables introduced by the notation).
But not only the stack prictures are the same, the
"interpretation semantics" of SWAP and what I will call
the "abstract interpretation semantics (for type-checking)"
of SWAP are the same; neither need the type-checking part
(during abstract interpretation, SWAP only exchange the
top 2 type ids).
So, it is convenient to implement the stack of types
by the usual data stack.
Not related to type checking, but probably interesting for you:
"Generational Stack Collection and Profile-Driven Pretenuring",
by Perry Cheng, Robert Harper, Peter Lee
http://citeseer.ist.psu.edu/cheng98generational.html
About GC and stacks.
But I think this is even more interesting, about GC:
"Compile-time garbage collection for the declarative language
Mercury", Ph.D. of Nancy Mazur
http://www.cs.mu.oz.au/research/mercury/information/papers.html#mazur-thesis
> > > If so, it could be an ideal platform to which to port some
> > > of the work I'm doing on program transformation and verification.
> > >
> > > -- hendrik
> >
> > Interesting. Do you have a web page about your work?
> >
>
> No. I should construct one. Let me try to remember to write
> you when I have. Or remind me if I forget.
Well, I can check periodically the web site of your company
if you intend to post your work there.
(It is at http://www.pooq.com/ , right?)
> I wrote a verifier using intuitionistic type theory about
> twenty years ago, on a conbination of hardware and software that's
> no longer available. I'd like to redo it now that machines are a
> thousand times faster.
>
> But in the meantime, I've noticed that the notation in which
> coq (another ITT-based theorem proof-checker) archives its
> proofs and theories is a stack language. And I've got the
> idea that it make sense to use Forth-like tools as intermediary
> metween high-level languages and machine code.
Well, if I understood your "intentions"(§), I suggest you to use
Threaded Code as intermediate data structures, with a concatenative
syntax as a simple human-readable representation; this choice has
the nice side effect that you have a simple, *fast* interpreter
of your intermediate representation, nearly for free (good for
bootstrapping and debugging).
§ pun intended, if you are *that* Hendrik Boom cited by Charles
Simonyi in "Intentional Programming Innovation in the Legacy Age"
http://citeseer.ist.psu.edu/simonyi96intentional.html)
Threaded Code is also similar to an Abstract Syntax Tree (AST);
so, the first time I encountered the concept of Slim Binaries
http://www.ics.uci.edu/~franz/SlimBinaries.html
by Michael Franz
http://www.ics.uci.edu/~franz/Site/publications.html
I immediately connected the two. Slim Binaries are, in substance,
*compressed* ASTs. A Slim Binary is obtained by a process of
1) identification of common code patterns in the program source
code, and 2) substitution of these common code patterns with a
reference to a single copy of each pattern (from multiple copies
to sharing of a same subtree in the AST).
In practice, Franz automated a process that a good Forth programmer
do manually: *factoring* (with the advantage that a Forth programmer
sees directly a much more manageable, succint program, compressed
in a meaningful way for *him/her*).
Slim Binary is a good migration format for programs: they
compile, at destination site, faster than textual source code
and migrate faster than pure binary code and source code (of a
language with a verbose syntax) because they are smaller of both.
It is also a good starting point for profile-directed incremental
(re-)compilation (but Franz probably ignored the possible connection
between Threaded Code and Slim Binaries: he cited as a cons of
Slim Binaries the fact that they are *not* immediately interpretable
at destination site, which is not true if they are disguised as
Threaded Code).
See "Adaptive Compression of Syntax Trees and Iterative Dynamic
Code Optimization: Two Basic Technologies for Mobile-Object Systems"
http://www.ics.uci.edu/~franz/Site/pubs-pdf/BC01.pdf
Sorry, I'm digressing (so I refrain to speak about possible
compression techniques of source code).
> And somehow I think that all these ideas can come together
> into a coherent, usable system -- taking my language from
> twenty years ago, synthesising proofs of programs, converting
> to low-level Forthish notation, and checking these proofs
> while generating machine code.
In a way you are working toward a pratical implementation
of Proof-Carrying Code (http://raw.cs.berkeley.edu/pcc.html
see also George Necula home page http://www.cs.berkeley.edu/~necula/)
I think you need to speak with Slava Pestov: its Factor seems
a nice candidate. I don't know what he thinks about proofs
of programs and formal methods in general, but he seems
not allergic to theory (a "disease" so common these days).
> None of these things by themselves seems too big to tackle,
> except they've gotten my mind tangled into one metarecursive
> lump.
>
> Yes, I plan to go all the way from formal theory to machine code.
This is a good news for me. Apparently there is no more interest
in formal methods in this project (Tunes). I hope someone will
promptly and strongly correct me.
Recently instead, there is a new effort to produce an Operating
System with formally verified security and correctness properties:
The Coyotos Secure Operating System
http://www.coyotos.org/
It will written in BitC, their language, and so amenable
to proofs of program properties.
I'm curious: what will be the status of your code? Proprietary,
GPL, BSD, other license?
Regards.
P.S.: sorry for my horrible English.
> -- hendrik
--
Massimo Dentico
More information about the TUNES-LLL
mailing list