Tunes compiler
Francois-Rene Rideau
rideau@ens.fr
Fri, 1 Mar 1996 03:37:41 +0100 (MET)
Dear Tunespeople,
As I told you, Patrick Premont and I have undertaken
to rewrite the previous Tunes source
as the inline-assembly part of a program
compiled in a true language with true macro capabilities,
instead of trying to get things work
with that horrible macropreprocessor+assembler combination we had
(with GNU m4 as the "best" available preprocessor found,
and as86 or C as the "best" assemblers found).
We finally decided that the source language would be trivially
syntaxed as parenthesized expressions, which would ease parsing,
and allow us to write a meta-environment in Scheme, a most simple,
yet most powerful language of the highest available abstraction level.
We started discussing in french
for what we first thought were mere technical problems,
until we realized that there were important choices involved,
so the Tunes list should participate.
So his is my last message in this discussion,
with some english translation.
It doesn't quite reply to the last quoted message by Patrick,
because we mostly agree...
[Scheme proving typically less performant than C++]
>> A first explanation, that shouldn't be neglected,
>> is that there are more people working at compiling C++
>> than there are working at compiling Scheme
>> (and there are more people working at compiling Scheme
>> than there will ever be about our own language dialect).
> We must therefore be very careful about efficiency. Adding features in
> the compiler first should force us to think of the feature as something
> which must be compiled efficiently. If TUNES is as slow as MIT-Scheme,
> we can forget about having any significant user base.
>> Now, that being said, Scheme truely is a low-level language
>> but for an abstract machine that doesn't match available hardware,
>> while C was invented precisely to represent low-level operations
>> on available hardware, that were subsequently themselves developped
>> with the C programming model in mind.
>> When I say Scheme is low-level, this means that in Scheme,
>> you can't work with types and abstract constructors,
>> that would be implemented independently as optimized code;
>> you must always work with a Scheme representation of considered objects,
>> without being able to intrinsically express constraints on the objects,
>> like the fact that some function parameter
>> should be an odd integer below 50 without square factor,
>> while the result would be a non-ordered tree whose knots are annotated
>> with integers.
>> It would always be possible to include assert()-like statements
>> that the compiler would take into account,
>> but this isn't quite intrinsic,
>> all the less if recursive properties on the objects' structure are meant.
>> Thus, a the Scheme type system is trivial and poorly expressive,
>> which is definitely bad
>> (nevertheless, it is not more expressive than the C/C++ type system).
>> I am convinced that efficiency lies in the ability to express
>> non-imperative properties of objects,
>> that the system can heed to optimize code,
>> not only by skipping unneeded tests,
>> but by benefitting from the known structure of the objects
>> as containing structural information that needn't be computed
>> by general algorithms that would be redundant in those cases.
>
> That's the kind of thing we should be working on as soon as possible.
> I agree that this should be a great way to provide efficiency and that's
> why we should concentrate on the compiler, which is where those wonders
> will be performed.
> [Cut : We should make sure that the cost of dynamic constructs can be
> eliminated when they are not used.]
[We particularly considered first-class contexts
as such costly dynamic construct]
>>> It doesn't serve to imagine constructs hotter than Scheme's
>>> if we're not even able to make Scheme efficient.
>> Scheme constructs or intrinsically inefficient
>> because of trivial static typing
>> (I refuse to use the word "dynamic typing").
>
> I'd say that there is no static typing. All types are dynamic concepts in
> Scheme. This makes efficiency a real problem for the compiler.
Beyond the terminology, we agree.
Shall we start with an efficient Scheme compiler like Bigloo ?
Maybe we could just write a front-end to the Bigloo code generator ?
[Bigloo can be found at ftp.inria.fr; it also compiles caml-light code]
>> Only by adding non-trivial typing can we hope to obtain
>> an efficient and/or high-level language.
>>
[Note that most the work of optimizing compilers today
consists in synthetizing the most precise possible type for objects,
in type hierarchies that take fine structure and side-effects into account,
when the human programmer could sometimes have given
as good or partly better information for free,
which would also have helped code maintenance,
if only the source language was more expressive;
for truely low-level languages like C, early optimizing phases
even consist in reconstructing some high-level constructs
from the low-level code].
>
> Yes. We will have to make significant changes to make Scheme's constructs
> efficient (of course they won't be called Scheme constructs after the
> changes but I think you know what I mean).
>>> I believe we're off the ground with our first-class contexts, etc.
>>> I believe we should forget the HLL-
>>> and focus on writing a Scheme compiler (of which the assembler is part).
>> I grant you that the HLL- extensions to Scheme should be
>> kept as small as possible;
>> but first-class contexts seem to me
>> an extension easy to get (as it's only reifing objects
>> that we should implement in the compiler anyway),
>> and quite useful.
>
> I still have the same objections about first-class contexts. They are
> not well defined, their usefulness has not been demontrated and we
> don't know how to implement them efficiently. Now if we wait and
> implement them in the compiler first, the first and third problems
> will be solved and we will be able to work on the second problem.
> Otherwise, if we start to implement them in an interpreter, we might
> not figure out how to compile it efficiently and start writing
> programs (the compiler) with it even though it may need to be scraped
> due to inefficiency. Also, I don't think I'd find it useful to have
> to use such a new feature while writing the compiler. It would slow
> me down having to innovate in the way I write the compiler while my
> mind is focused on translating the source to efficent code.
.....
>>> Then we can design enhancements to the language that we can implement
>>> (compile) efficiently.
>> You're quite right that a minimal kernel of functionalities
>> should be isolated. Albert E. said
>> "Things should be made as simple as possible, but non simpler".
>> I admit I tend to overcomplicate the HLL-,
>> but I think you tend to oversimplify it.
>
> I'd rather add the complex constructs in the compiler. This way we can
> write the compiler in a familiar language and be sure of the efficiency
> of our new constructs.
Ok.
>>> You can't design anything and then assume that a partial evaluator
>>> would eventually solve all your efficiency problems.
>> You can't design anything, sure not.
>> But you can design the language *with partial evaluation in mind*
>> then reasonably expect a future partial evaluator
>> to solve many (not all) your efficiency problems.
>> The requirements for partial evaluation are nonetheless
>> made necessary by our other concerns: well-defined "continuous" semantics
>> (that conserve some locality of meaning with respect to parsed objects).
>
> I think that we'll have to be writing the partial evaluator before we
> start to frequently depend on it. The compiler is the special case of
> the partial evaluator that we are writing first (it generates a
> function where the source code has been "evaluated" (considered) but
> not the program input).
>>> I feel that our first challenge is to provide a OS with integrated
>>> compiler structured according to a basic object system.
>> Agreed.
>> Now, I think that writing an interpreter before to write a compiler
>> could prove useful, as it would allow quicker prototyping.
>> Surely we can co-develop compiler et interpreter.
>
> Forcing the compiler's host language to be the interpreter's source
> language prevents that. I'd rather have a stable host language for the
> compiler. And once a new language feature has been proposed, I think
> we should implement it in the compiler before rewriting the compiler
> with this new feature. That is : update the compiler's source language
> before we update its host language. This way it will become obvious
> if a construct cannot be implemented efficiently, before we have
> started using it extensively.
You are right. Then, either we choose to stick
to the original meta-language for long,
in which case I'd like to use a most expressive extension
of Scheme (rather than a least subset) as the meta-language,
or we choose to bootstrap the language by writing extension compilers
in restricted previous versions, in which case we'd rather compile
to the host operating system before we compile to Tunes.
>>> MIT-Scheme has records and structures (particular case);
>>> it should probably be enough
>>> (perhaps with a simple generic function added)
>>> so that the first iteration of our OS/compiler be OO.
>>> We can enhance the object system and lots of other things
>>> by changing the compiler thereof.
>> Do you suggest we shall adopt MIT-Scheme,
>> excluding any other Scheme implementation ? Why not.
>
> We'll have to choose a particular implementation anyway for macros.
Where are the docs for MIT-Scheme macros ?
>>> While we are only writing the system (currently porting previous code),
>>> you are describing me how it should be done to take advantage of
>>> constructs you just imagined, that seem not well-defined to me,
>>> of utility still to show,
>>> and difficult to efficiently implement in a compiler.
>> Ahem; I agree with "non well-defined", et "of utility still to show",
>> whereas I'm sure that these points can be successfully tackled.
>> That's why I admit you're right to refrain my enthusiasm,
>> and to tell me to focus on the essential,
>> for I can always attack those problems later.
>
> You can work on any language feature you'd like to add, even immediately,
> but I'm not comfortable using it in the code I write until it has been
> sufficiently well defined, implemented and communicated to me.
>>> I read your code for the assembler.
[I am using the same set of symmetric matching rules
between pseudo-op and byte-sequence patterns
to assemble pseudo-ops and disassemble byte-sequences,
throwing away the horrible code from most existing assemblers]
>>> I like the general structure:
>>> a database of rules defines by part a bijection
>>> between assembly code and machine code.
[Not exactly a bijection, but that's a detail]
>>> In particular, I like your bit-field patterns.
[That construct/deconstruct bytes from/to bit fields]
>>> From what I said above, I believe we should modify
>>> your code so it works in Scheme.
>> You're perfectly right.
>> I'm going to write a pattern matcher/unifier in Scheme.
>> Shall I forbid myself from using macros ?
>> If not, what system to use ?
>
> I'd use MIT-Scheme records to create the type for pattern variables.
> This way it will be distinguishable from everything else meaning
> that every other Scheme object could be used in patterns without
> being confused with variables. I wouldn't distinguish patterns
> from standard Scheme objects so that any object can be used as
> a pattern by putting variables in it.
> I'm sending you a Scheme source I've had hanging around. It implements
> logical extensions to Scheme. That's an overkill but the unification
> routines might be useful. Just in case. Also I think we should decide
> if we allow bijections in the patterns or if we use variables
> in their place and conjoin an additional constrain like y=f(x) :
>
> For exemple (using an improvised syntax to make it clear) :
> rule : (mov (var reg) 10) -> (#x45 (reg-code (var reg)) #x0a)
> that was using bijections (reg-code) in the patterns the other way is :
> rule : (mov (var reg) 10) -> (#x45 (var regc) #x0a) and
> (var regc) = (reg-code (var reg))
>
> See what I mean ? They second way may make it easier to implement
> the reverse transformation. Or not.
I'm not sure I get what you mean.
Why would either way not be as easy/difficult to implement as the other ?
It seems to be that they expand to just the same constraints for
any unifier.
Actually, I'd like to have the pattern language be the basis for the
future type system and formal specification system.
In particular, there would be patterns to match lambda's,
patterns to match patterns, taking contextual effects into account, etc.
[This is why, again, it seems to me that first-class contexts are of
foremost importance: to allow to easily construct such patterns].
>> Beyond our disagreements, what counts is the code we have and write,
>> that is, our common project.
>> What if we chose Scheme as our language,
>> but with powerful extensions for macros and modules ?
>> Basic Scheme is too burdensome to use,
>> but such extensions would make it pleasant to use!
>
> I don't think we need modules yet (to write the compiler). The main
> problem they solve is that they allow many namespaces. Sure it would
> be nice to have them but they are closely related to contexts and I
> don't think we can settle the definition of contexts right away. We
> can simply use the seperate files method to allow easy updating of the
> code when we finally introduce modules. As for macros, we could use
> MIT-Scheme's macros.
I'm not convince. But let's write code and convince
ourselves/each-other the only truely convincing way:
by writing the actual compiler...
-- , , _ v ~ ^ --
-- Fare -- rideau@clipper.ens.fr -- Francois-Rene Rideau -- +)ang-Vu Ban --
-- ' / . --
Join the TUNES project for a computing system based on computing freedom !
TUNES is a Useful, Not Expedient System
WWW page at URL: "http://www.eleves.ens.fr:8080/home/rideau/Tunes/"