CMUCL, threads, x86 and such

Martin Cracauer cracauer@cons.org
Tue, 29 Apr 1997 17:20:43 +0200 (MEST)


I thought I should make you some comments on some issues regarding CMU
Common Lisp. 

I am member of the current developer team, although I probably qualify
best as their village idiot, compared to the real hackers (TM) over
here. But I own the machines :-) 

Overall Situation:
------------------

We are currently a group of individuals developing CMUCL. We based our
work on CMUCL 17f + some CMU work that wasn't in 17f. One member of
the former CMU team is with us. Development is very active, we have
the critical mass of people to actually move.

We plan to keep CMUCL running on all the platforms CMUCL supported in
the past in the upcoming 18a release, with the exceptions of the Mach
and IBM RT PC ports, which are still in the sources, but haven't even
been tried to be build for a long time.

See
   http://www.cons.org/cmucl/
for general CMUCL resources

and
   http://www.cons.org/cmucl/cmucl-cvs.html
for instructions to keep up with current sources.


Threads:
--------

Has been discussed. The outcome was that it isn't that hard to add
'fake' threads (those that run in one Unix process and cannot be
scheduled on different processors in an SMP machine). Basically, you
need to mess around with symbol bindings and GC, then explain why you
broke the debugger, start again - this time skipping the debugger
flames - and convince people you didn't overlook something.

My own additional concluding was that using OS threads for an OS that
has shared-address-space processes (FreeBSD's rfork2() and Linux'
clone(2)) you have source for is quite possible, too. I think you need
to implement a signaling mechanism that informs the Lisp runtime of
context switches in advance so that symbols get set up for the new
thread and then reschedules the CPU when the runtime gives back
control.

If anyone of you has a the necessary time to make its way through the
CMUCL runtime, you will have little problems implementing the first
solution. 

I archived (and will forward to anyone interested) the above
discussion, which included instructions of most of what needs to be
done, so understanding CMUCL's runtime is the only big part of this
undertaking. I'm happy to connect you to our local folks who showed
interest in working on threads.

The semantics are a separate issue, in any case. Just for starters I
would like to see a proper, worked-out model how to map stdin/out and
debuggers of threads to available listeners. Something like BSD's
virtual terminals is probably not overkill here. When we do a LispOS
on top of a Unix kernel, we will maybe be able to make ptys and
Listeners the same. To be discussed...

For a list of existing API's, see
http://www.cons.org/cracauer/lisp-threads.html


x86 Platform
------------

The x86 platform is stable. Almost all current developers use FreeBSD
or Linux/x86, so it's actually in *best* shape on this platform.

x86 drawback is a worse garbage collector, which is slow in general
and adds constraints about register usage that can slow down certain
code on this platform. A new GC is in the works and will solve both
problems. Implementing threads will need to mess with GC anyway, so
that is touched sooner or later.


Compiler + Runtime
------------------

First of all, CMUCL consists of parts that need to be looked at
separately.

The thing that made CMUCL popular is probably Rob MacLachlan's Python
compiler (pronounced like the snake, while the scripting language is
pronounced like the comedy troop, I think).

The compiler could also be used to compile code for other runtimes or
virtual machines. A little glue runtime would - for example - allow
CMUCL to compile code for the Java VM.

Of course, using it that way will loose a lot of speed, but Python
will still be able to make the resulting code much faster than most
other Lisp compilers would for the same target.

I don't think the compiler has any serious problems. The only thing
I'd like to have in addition is support for a optional static/light
object system like Dylan has, but that's a semantic problem anyway.


As for the CMUCL runtime, the weakest point is probably its
CLOS/PCL. Additionally, as Peter already said, it goes down to the
machine and the OS a number of times where I think more elegant
solutions could have been possible. The `nm` trick to get code
addresses in object files is an example.

Our runtime, BTW, uses the same compiled-file format for all platforms
sharing the same CPU. You can move fasl files from SunOS to Solaris
etc. 

We also have support for bytecodes and a byecode-compiled-code file
format (although it is different for big- and little-endian
platforms). But I think it is a bad idea to restrict yourself to
bytecodes. OK, not to be discussed in this message...

Complexity
----------

CMUCL has taken critique for its complexity and for being hard to
build.

I am now entering advocacy state, but I think that CMUCL in all its
complexity is worth every bit of it. The building process nedds to -
but could easily - be improved.

The only exceptions are some issues when messing with Unix, but LispOS
is trying to solve that anyway, eh :-)? These include most of the
building problems (the above `nm` trick, among others).

I think the runtime is sufficient easy to understand and the (in fact
complex) compiler can be handled like a black box when doing something
like LispOS.

The rest is working out the dependencies between parts of the system
and make some proper build routines that automate the process. This
isn't that hard, but so far noone has done so. Once you understand
building, you don't need such building tools anymore :-)

Anyway, help is welcome. The area of building scripts is probably a
nice target for a new developer, since you learn a lot about the
system and can't break anything serious.

Mach / Flux
-----------

CMUCL was originally developed on Mach. The code is still there,
although recent versions haven't been tested on top of it.

But it uses the Unix server of Mach to get system services. The Unix
signal mechanism Reginald mentioned as a weak point applies here,
too.

It uses some Mach-specific features, but it isn't layered directly on
top of Mach.

I think it's a better *not* restrict yourself to a non-Unix-server
Mach, but I'll send a separated message about it later.


Help from our side
------------------

I can only speak for myself and speculate a little about the others,
of course. Peter VanEynde already spoke, too.

I am certainly interested and would offer help within the limited time
I can spend on free software and within the incomplete understanding
of CMUCL I have (which may be better than starting from scratch).

I am the keeper of the CVS repository for CMUCL and I can provide the
same service for your project. From my position I can also arrange
that we keep both source trees as near together as we can. We will
probably have to leave the normal CMUCL repository untouched by LispOS
activities but I can offer quite a bit assistance to keep the
LispOS-specific things close to the main line.

When it comes to actual coding, I am very likely to be much more
interested to integrate CMUCL into FreeBSD or NetBSD, not Linux. I
won't go into holy wars here, it is sufficient to say that I use
FreeBSD in real life and will need to keep it that way. I am also a
FreeBSD developer and know the people over there, an important issue.


To speculate a bit: The other CMUCL developers are quite busy
improving CMUCL's correctness and speed. I'd say they are (or should
be) a little conservative about modifying their sources for something
radical different (like a different environment model). Well, that's
easy once you have an existing and running system :-)

Our people are no different from usual net wizards: They are happy
and proud to explain mechanisms to outsiders as long as they feel the
outsider is worth the effort. (I have to be careful here, too, BTW,
but I own the machines :-).

Should someone of you be really interested in CMUCL's working it is
probably a good idea to offer to add your gained knowledge to the
CMUCL "internals" document Rob MacLachlan once began.

OK, so much for now.

Let me know if you have further questions.

Martin
-- 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Martin Cracauer <cracauer@cons.org> http://www.cons.org/cracauer
  cracauer@wavehh.hanse.de (batched, preferred for large mails)
  Tel.: (daytime) +4940 41478712 Fax.: (daytime) +4940 41478715
  Tel.: (private) +4940 5221829 Fax.: (private) +4940 5228536
  Paper: (private) Waldstrasse 200, 22846 Norderstedt, Germany