Mutation

Henry G. Baker hbaker@netcom.com
Wed, 14 May 1997 07:11:29 -0700 (PDT)


> I'm not sure I understand the utility of functional arrays.  Aside
> from the difficulty with initialization semantics, it doesn't seem
> like a large functional array is of much use.  Here's one example:

(I'll try to deal with your example in another posting.)

One of the most important kinds of functional arrays are functional
character strings.  In particular, character strings used as 'pnames'
('print-names').  One of the biggest performance problems with CL is
that some 'interrogation functions' -- i.e., 'symbol-name' (I think
that that is the name of the function!) -- are forced to _always_ copy
the string, because they can't guarantee that the requestor won't
mutate it.  With an immutable character string, however, symbol-name
can simply and efficiently return the actual character string, safe in
the knowledge that it can't be hacked.

Actually, I care about the issues involved in building a fast 'reader'
a lot.  If we expect people to utilize parenthetical expressions more
often, we have to make sure that the reader is _very fast_.  Every
reader I've looked at is _slow_ -- mostly due to the slowness of
'interning'.  The old Maclisp's had ad hoc pnames and specialized
pname buffers to speed up reading, but modern CL's cons _and then
immediately throw away_ a new string for each symbol read.  And if you
follow the silly CL semantics, you also have to cons an uninterned
symbol itself, which is also immediately thrown away.  This is because
_most_ symbols read are _found_ in the symbol table, so all the work
to build the new symbol is a waste.

I've played with a symbol reader that is based on hash-consing a list
of characters, and the reader is organized to _assume that the symbol
is already in the symbol table_ rather than assuming that it isn't.
This reader bears certain similarities to compression utilities such
as 'compress' and 'gzip'.  The side-effect of such an organization is
to make 'symbol completion' a la Tenex almost trivial.

Getting back to character strings.  The efficient reading of a large
character string isn't trivial.  It would be nice if the standard
printed version of a character string indicated its length in advance,
but it doesn't.  (I have suggested including such a length as an
_option_, for efficient machine-to-machine protocols using the
standard reader.)  If one cannot allocate the entire string in
advance, then one has to _guess_ at the length, and be prepared to
copy it to a final place once its length is known.  This suggests that
one will need a set of buffers whose sizes are in a geometric
progression (kind of like the file indices in a Unix file system), and
the earlier members of this progression are kept in a cache of buffers
for the reader, and the later (larger) members are allocated when
needed for reading really big strings.  Thus, to summarize, I see no
way to avoid copying when reading really large strings.

-- 
Henry Baker
www/ftp directory URL:
ftp://ftp.netcom.com/pub/hb/hbaker/home.html