mapping files to objects

cwg@DeepEddy.Com cwg@DeepEddy.Com
Tue, 06 May 1997 15:22:48 -0500

Content-Type: text/plain; charset=us-ascii

>   If pathname were a CLOS class, then one could subclass it instead of
> having fixed slots. For instance, the URL-pathname class could add the
> other 4 (I lost count) fields that it needs to keep track of all of
> the parts of a URL. The 6 normal components would still contain their
> normal contents, so any program that just deals with pathnames would
> have a chance of working.


>   Embedding or inferring type info from the name is a really, really
> bad idea. It's so error prone as to be useless. If you want to know
> the type of data in a file, look in the file!
> >3) It now becomes nearly trivial to write the function that I previously 
> >proposed to map from a pathname to an object representing that file.
>   Not even close. What's a .tgz file? Is it a compressed file or a tar
> file or a compressed tar file? Kind of depends on what you want to do
> with it, doesn't it? What's a .l file? (It has multiple meanings on
> most Unix systems, partly due to us Lispers.)

In the .tgz case, it is kinda all three.  Fortunately with multiple
inheritance, it shouldn't be hard to allow it to behave appropriately in all 
contexts.  In the .l case, you've got a point.  Since I use perl much more 
than I use prolog, I've had to reconfigure my emacs's mode for when it sees a
.pl file.  That's exactly the same situation.

>   A much better approach is the way the 'file' command does it. The
> Irix version, for instance is table driven to do some pattern
> matching. Typically, it looks at the first N characters to see if it
> is a particular string, like "%!" means it's a PostScript file. 

I suppose I'd sound like I've been spending too much time on Unix if I were to 
say I didn't want to look in the filesystem because of performance issues, and 
if I were a good lisper, I'd be thinking about semantics that guarantee that 
things work rather than run fast, right?

I merely didn't want to either (a) read a 10mb mail file just to determine 
that it is a mail file or (b) to open/read/close the first block of a mail 
file twice.  If whoever writes this hunk of code, can either keep those cases 
from happening or show that it's not important, that's fine with me.

Okay, let's have it your way.  The particular situation that I'm interested in 
for a mail program is to be able to recognise various mailfile formats.

An mbox file starts with "From " and has more of them later down in the file.
Much as we hate that, that's what it does and I can recognise it.

A qmail Maildir contains subdirectories called "tmp", "cur", "new".   It may 
have hidden files as well.  I think I can recognise that.  It might be named 
"Maildir" in a user's home directory.

An mh mail directory may have subfolders that are just directories; it may 
have files with names that are all digits (maybe preceded by a comma if 
deleted); it may have configuration files with names like forwcomps, 
replcomps, components, etc; it may have hidden files with names like
.mh_sequences or .xmhcache; it's probably inside another mail directory; 
it might be named "Mail" in a user's home directory; it may be completely
empty.  I don't know how to recognise that.

A usenet newsgroup fits most of the same criteria as an mh mail directory, 
except that I probably don't have write permission to the directory.  If I can 
recognise it, I should be able to treat it mostly the same way.  It's known by
being rooted somewhere that's defined in a configuration file in the news
software.  If I implement it, I'd like to handle it with a different, but
related class than the mh directory.  If I don't implement it, I'd like it to 
*not* be confused with the similar looking mh directories, but it's not clear 
how to tell them apart.

Maybe I'm just overly fixated on a few screw cases that happen to be some of 
the first cases that I'll want to deal with.  These are the cases that lead me 
to want to find ways to declare the type of a particular file/directory.

All I really want is a better interface to Unix file systems so I can start 
implementing routines that read such files.  If we can agree on an interface, 
I'll probably throw together something minimal for my own use making lots of 
assumptions and get started.  I can then replace my minimal stub with a real 
implementation when there is one.  What I don't want is to just write code 
like I would do in C (or did do in perl) that only sees files as streams of 


Chris Garrigues                    O-              cwg@DeepEddy.Com
  Deep Eddy Internet Consulting                     +1 512 432 4046
  609 Deep Eddy Avenue
  Austin, TX  78703-4513              http://www.DeepEddy.Com/~cwg/

Content-Type: application/pgp-signature

Version: 2.6.2