Everyone else (was: Tunes compiler)

Patrick Premont premont@cs.toronto.edu
Tue, 5 Mar 1996 19:58:46 -0500


>    Patrick Premont said:
>    > How about everyone else on the list ? Are you lost ? Do you have comments ?
>       Are you even listening ;-> ?
> 
> Yes, I for one am.  As for why I'm listening I don't have that much
> time right now and am mostly trying to profit from the `tunes
> experience'. :)
> 
> First Several Minutia
> 
> Trying to get invertible functions at your assembler level is a little
> bit ridicoulus.  Mathematically it's not guaranteed, and it's also
> quite unlikely in practice.

To me it seems natural to do it this way. It is to be expected that
the inverse functions will only be partial functions in many cases,
when the normal function is not onto. But generating an error should
be the appropriate response to the application of a funciton to an
argument outside its domain. In some cases, the initial function won't
be one-to-one which means that strictly speaking the inverse function
does not exists. However, a more useful approach is to say that there
is non-determinism in the inverse function and resolve it in whatever
way is appropriate. Often, the best way will be to choose any value
among the different options. In other cases, the inheritance
mecanism may be more appropriate (use the most specialised rule,
when the inverse function is rule-based).

> A note about strong typeing.  This is definently a good way to gain
> effeciency.  But a word of caution, you need to let the garbage
> collector know about all of your types. 

More precisely, it needs to know what is an address and what isn't.
There could be a mapping from a type to a "format identifier" and we
could then have the gc know only how to interpret the format
identifier. It then doesn't really matter to the gc what is the type
of some datum. I think your concern is valid but a sensible implementation
of the general type system we suggest should avoid bloating the gc
with reasonable ease.

> Several things less trivial
> 
> What seems to me to be a good abstraction for this kind of problem is
> the idea of a compiler, _and_ an assembler.  Both machine dependant.
> 
> For a useful assembler I am imagining a program that takes a subset of
> a high level representation of _all_ assembly languages (the machine
> implemented subset of course) and converts those to assembly language.
> This assembly language could be called the LLL if we choose?
> 
> example:
> 
> for a risk type chip
> (add R1 R2 R3)    # add R2 and R3 and put in R1
> (add R1 R2 24)    # add R2 and 24 and put in R1
> 
> for a pc
> (add R1 R1 R2)     add ax,cx  ; or something like that

The problem I see with that is that the semantics of the operation add
will change from one platform to the next. It will update different
flags, accept operands under different modes... This means that the
compiler that comes before this "uniform" assembly language
must still be considerably machine dependant. There then is no apparent
reason to force the mahcine aware compiler and machine aware assembler to exchange information in a "machine independent" way. A specialised version
of the assembly language can be used between the corresponding compiler
and assembler if it is beneficial. This does not affect the other
specialised compilers and assemblers and does not hamper translation
between the assembler of different machines since the possibility for
translation is not provided by the uniform assembly language (although
it looks uniform, it is not uniform in meaning).

> Some of the discusion indications you are moving in this direction
> anyway.  For the machine mostly independent compiler it would have to
> have special knowledge on how to properly construct assembly language
> routines, and how to do other machine dependent things that only need
> to be variables for now.

Sharing the same superficial representations for the different assembly
languages will often be a nice default choice. But when it is not
convenient, nothing is lost by using different versions for different
platforms. We will surely have a machine independent language however
to which the machine-independent compiler part will compile. It won't
be at the assembler level (it will be slightly higher) because
the assemblers (or machine language) of different platforms vary
too much.

> Now I also see the inevitibility for an efficient very high level
> language to need to be able to compile expressions at run time.  For
> either lambda expressions, or to reduce run-time constants in inner
> loops and other places to constants instead of variables.

Yes.

> Additionally a disassembler based on a machine independent assembly
> language could be used to aid in manipulating already compiled
> expressions.

Disassembling to a machine dependent (simplified, no-labels,...)
assembler is easy. Our invertible functions should take care of that.
Going further is highly non-deterministic and possibly computationally
difficult if no dissassembling information was added in the
compilation. Adding this information is an option to consider, it may
be better in some "memory starved" cases to be able to regenerate the
source instead of keeping it along. Note that comment can be considered
part of the source.

> My thoughs towards contexts pattern matching etc,
> 
> There is a lot of that in your discussion, and some of the higher
> level programming concepts I just don't quite get.  
> 
> The one thing I am quite sure about is that for a language to be able
> to expand and grow without major growing pains it needs a uniform
> syntax that is the same for both language features and added features.

It may help. We are doing that. What I find more important is what
you said but with syntax replaced by semantics.

> The same for data.

Here I disagree. There cannot be a uniform syntax for specifying data.
What there is in Lisp is a common syntax to specify programs and
lists. When you create a new data type, you have to create its
instances by writing code (as lists, like all other code). But the
syntax of the data is not the syntax of the program that creates
the instances. It is at a higher level. Take the exemple :
(make-point x y color). This is a program which create a colored
2-d point. The syntax of the program is specified by Lisp but the
syntax of the data in not, at least not where it counts. It is the
make-point function or macro which defines the data's syntax. Here
are a few more exemples with the same data and different syntax :
(make-point #(x y) color)
(make-point (x y) color)
(make-point color x y)
...

So what Lisp provides and that is important is a representation of
programs as data in the language. This allows easy program
manipulation.  Note that the fact that the data structure use is
always lists is no important. If your language supports object, you'd
better use objects (of per language construct) instead of lists.

>  Lisp almost provides both of these, and Forth provides
> the first. A disassembler with uniform syntax, and some super
> environmental mumbo jumbo might just complete the mess.  
> 
> I think the explicit switching between compiling and runnging states
> in Forth is a better model then macros in any language.  So please
> include this.

MIT-Scheme's macros include that capability : you can "run" any Scheme
code you want there and generate the appropriate code to be compiled.
The main use for this is not to execute things at compile time but to
define operators with different syntax. It also help efficiency, which
is the other reason I see for wanting to run something at compile time
(time which may not exist by the way, it is defined by the
environement, no the language). I don't like the idea of switching to
running mode every time you see something that will be constant at
compile time. That's the job of the compiler/partial-evaluator. Doing
it manually is a waste of time. So the only valid reason that remains
for running code at compile time is to generate code. This is what
macros allow.

> For optimizing the garbage collector it really should know which
> objects need finalization, and it where pointers are so it might be
> worth while to do some long range planning in this area as well.

Yes, the need for finalization is the other thing the gc should know
apart from where are pointers. I have studied gcs in a course but
we didn't touch finalization. I'm not sure how to do that. I should
look it up to be able to do long range planning as you suggest.

> Just a Few comments toward an OS
> 
> The most portable frameworks of functionality have been built on
> microkernels using multiple servers.  One for authentication and
> others for filesystems ( For us possible persistent address space
> systems but as a unix file == an address space etc)  That talk through
> mach like message ports, to allow use as a distributed OS.  
> 
> The current research in mach has optimized these calls to the point
> that on the same machine they can be faster normal kernel calls.
> 
> The other reason I propose this model is that it works very well with
> current stated aims of tunes, and can be implemented on top of nearly
> any multiprocess OS using IPC, so we could avoid some of the low level
> nitty gritty for a while until we have workable code.  Then we could
> find the premium kernel to run it on.  I like mach (Can you tell?)

I don't know about mach and kernels. I'm not impressed by the idea of
using kernels or micro-kernels in an OS but I really don't know enough
about it to formulate a valuable opinion.

Please feel free to comment or argue further.

Patrick