low-level approach

Billy Tanksley btanksley@hifn.com
Tue Jan 15 17:10:03 2002


From: Kyle Lahnakoski [mailto:kyle@arcavia.com]
>Just because I label a concept or thing academic, it does not mean it
>necessarily came from, or is used in, academia.  Forth was 
>designed from
>certain principals that, to me, conflict with my intuitive concept of
>expressive.  I only label these principles academic because they do not
>appear to be useful.

Okay, that's cool.  So "academic", to you, means "useless to me".  I've
heard that before.

>Forth used by engineers makes perfect sense!  Engineers write small
>programs, programs built for hardware (very likely small hardware). 
>Forth, is well suited to these types of devices.  Now I can see that
>Fourth's principles of design are quite helpful in this area 
>of software development; not academic at all.

Right.  Nor useless.  Let me add, though, that Forth has also been used to
solve some very large problems.  It's true that the resulting Forth program
is small, but that doesn't mean that the problem was small -- a C consulting
company bidding on the UPS software asked for about four times more than
what the Forth company which did it billed.

I was also surprised to see talk about Forth being academic because one
interesting thing about Forth and concatenative languages in general is that
there is almost zero research on them.  This is one of the reasons I'm
studying them; not because they're a silver bullet perfect language (they
aren't), but because they're unexplored territory.

>> What combinators _do_ is make dataflow clear, and computing 
>> is all about
>> data flow and modification.  Combinators allow the 

>Maybe I am missing your point.  But why is clear dataflow a 
>good thing?
>Whenever I think of managing dataflow I can't help but to think that I
>am doing more work. Maybe someone can jump in here and help me
>understand.

Because computing is all about data flow and modification.  One of the
harder parts of programming is managing dataflow, and being able to manage
dataflow is crucial to programming.  A language which ignores dataflow
forces you to keep the dataflow in your head or in a seperate, unchecked
document; a language which forces you to make the dataflow explicit also
allows later modifiers to realise what impact their changes will have.

Now, it might take a bit of thought to make the dataflow explicit; but I
guarantee you that you've already done the hardest work, generating the
dataflow in the first case.  So Forth doesn't really add much challenge in
this respect.

There's one other thing Forth forces the programmer to do which DOES add a
challenge: the programmer has to arrange sources and uses of the data so
that it's on the stack for the minimum amount of time, and so that it's on
top of stack when it has to be used.  This /is/ a challenge, and takes some
getting used to -- but it's also a critical part of any computational task,
and provides useful data to optimising compilers which they could not
otherwise get.

>I can see the functional aspect of Forth being useful (lack of side
>effects as you mention).  I can also see linear aspects of 
>Forth (again, a lack of side effect).

I'm glad you like them, but they have nothing to do with lack of side effect
-- on the contrary, both aspects of Forth operate with or without side
effects.

>> I must point out that complex dataflow is complex regardless 
>> of whether it's
>> expressed by variables or by explicit dataflow notations.  And with
>> variables, the dataflow MUST be held entirely in your mind; it's not
>> possible to write it in the program.  With dataflow, the 
>> flow only has to be
>> thought of once, and from then on you can forget it -- dataflow
>> modifications can be made locally.

>For dataflow problems, dataflow notation would certainly be a good
>thing.

All problems involve dataflow.

>I am really concerned with problems that do not have a dataflow
>aspect.  SQL is an example of a domain specific language that removed
>all dataflow specification.

It's a slow, special-purpose language.  Look at ksql -- it's SQL plus
dataflow.  /Much/ faster, and much smaller than any SQL implementation I've
seen.

>Every time you specify dataflow you imply a certain computation model. 
>When that computation model changes then the program must be reversed
>engineered to its important results and reprogrammed in the new
>computation model.  I am thinking of distributed and quantum computing;
>both highly parallel computing models.  Tell me more about Forth in a
>parallel world.

I can't speak about quantum computing; I know nothing of it.  Parallel
computing, though, I've considered.  There are three major ways to
accomplish parallel computing:

1. Allowing the compiler to parallelize your serial program.  This works
very poorly in any language, but better when the dataflow is known to the
programmer and thus available for hand-optimization.
2. Writing problems which use primitives that can be parallel, as in APL's
array operations.  This can be done as easily in Forth as it was in APL.
3. Writing your programs to execute as seperate, hand-coded processes.
This, of course, can be done just as easily in Forth as any other language.

Only solution #1 could cause any problems at ALL with specified dataflow;
and in reality, the problems are there in all code, whether the dataflow is
specified or not.  With specified dataflow, the programmer can /see/ why the
compiler's having a problem.

>> >All this optimization you mention must be done by the human 
>> >programmer.
>> >This optimization should be done by the compiler.  If you remove the
>> >necessity to specify how a program will run, you are left 
>> >with much less to specify and and easier development time in general.

>> This optimization MUST be done by the programmer: he's the 
>> only one who
>> knows what his task will need next.  The compiler can make 
>> guesses after
>> undertaking extensive analysis, but will never better the 
>> programmer; the
>> compiler is better used as a domain expert on the specific 
>> optimizations
>> needed for the target machine (alignment, caching, number of 
>> registers, and
>> so on).  After all, ordering the operations is an obvious 
>> and trivial part
>> of designing the algorithm.

>We disagree on the roll and abilities of the compiler.

Clearly.

>I believe the
>compiler should be responsible for as much dataflow as possible.
>If dataflow is not part of the problem then it should not be specified.

The dataflow is ALWAYS part of the problem and the solution, so it must be
specified.  The question isn't whether or not it's going to be specified;
the only question is how hard it's going to be to figure out.  Variables are
like magical teleporters: they make it look like data gets used, then
teleports to the next place it's needed.  Well, in reality that doesn't
happen.  The data has to sit somewhere.  Explicit dataflow simply makes that
sitting process explicit, and the way Forth does it also lets the compiler
(or anyone who cares to know) see how urgent the data's going to be, in case
the compiler has to choose between more and less speedy memories (and it
almost always has to).

The compiler isn't, and CAN'T be, responsible for the dataflow.  The
algorithm specifies the dataflow, and the concrete implementation makes it
certain.  We KNOW, thanks to the algorithm, that data gets generated _here_,
then used _here_, then used again _here_, and then never looked at again.
That's the dataflow.

>In general good compilers outperform humans. There may be a problem
>where a human can produce more efficient code, but then that can be
>taught to the compiler.

Not true.  In EVERY case, the programmer is completely responsible for the
speed of his code.  He must choose the algorithm, thus generally specifying
the dataflow; then he has to implement the algorithm, thus EXACTLY
specifying the dataflow.  The compiler should be responsible for
machine-dependant details and microoptimizations at this point; it's very
good at those bookkeeping details.  It CAN'T choose a better algorithm, nor
can it reimplement the algorithm the programmer chose to make a better
implementation.

>The compiler is an expert in many things, one
>of those things should be dataflow.

I don't have a real problem with that.  The compiler is obviously
responsible for allocating resources for the specified dataflows, no matter
what languages you're using.  But when you're using a dataflow language, the
programmer can tell the compiler how important a given dataflow edge is to
the program, and the compiler can allocate it immediately, and spend its
time optimizing something else.  It's fine when the compiler acts as an
expert, but it's not fine when it's forced to act as a programmer -- then we
have to implement an expert system and accept mediocre results.

>> You're also missing what I said above: a parse tree doesn't 
>> give you any
>> more textual flexibility than a tokenized concatenative 
>> language naturally
>> has (sometimes a lot less), and it's harder to get and 
>> apply.  This is why
>> it's taken so long to come out with decent refactoring tools.

>If parse tree were so difficult to manage then Smalltalk would not have
>so many refactoring tools.

It's not exactly "so difficult"; it's merely *harder*.  It took Smalltalk a
lot of time to get those browsers, and Smalltalk is a simple language.
Compare Java, a much more complicated language with a LOT more effort behind
it -- its refactoring browsers are only beginning to get off the ground,
some 3 years after publication of the Refactoring book made it very popular.

-Billy