Progress on a Python compiler

Armin Rigo arigo@ulb.ac.be
Thu Sep 20 09:30:03 2001


Hello Brian, hello everybody,

A short note about my current work, not directly related to Tunes. I am
progressing on a kind of compiler for the Python language, "Psyco".

It is actually a kind of just-in-time compiler, working at run-time. It
consists of two parts: a language-independent "glue" part (which I more or
less finished), and the inner loop of the Python interpreter rewritten in
a special, resticted language. The "glue" part executes Python programs by
running the interpreter, but not in the common way. The run-time data are
split into what must be known to produce good processor code, and what
need not be. Typically, we need to know at least which Python code to
interpret before we can make good processor code that performs similarily;
in the case of Python we will probably also need to know the type fields
of the actual Python objects fed to the Python code; the processor code
can be very efficiently optimized depending on the type. On the other
hand, if the Python objects are, say, integers, then the same processor
code can easily manipulate all possible integer values. The "glue" code is
responsible for choosing which data goes into which category, and managing
caches of the produced code. The Python-dependent part of the code either
makes computations at compile-time (for compile-time-known values) or
emits code to do it at run-time (for the rest).

The choice between the two categories of data is not done statically; I
mean, I didn't say "Python code and type fields are in category 1, the
rest in category 2", which would amount to writing a JIT compiler with
type-analysis. No; I let this separation be done based on the uses of the
data. In two words, if I have a "switch" on data, or a call to function
pointer, then knowing the data at compile time enables significant
improvements, so I mark that data as "category 1"; as long as this does
not occur data is in "category 2".

This makes the approach quite general, I think. Experiment will be needed
to know whether it performs reasonably well. Another interesting point to
note is that this idea applies to any other interpreted language; all you
need to do is rewrite the inner loop of the interpreter for the other
language in a special way. Orthogonally, you can change the target
processor by adapting parts of the "glue" code.


Hope you didn't find this too far from the subject of this mailing list,

Armin.