On data types

Matthew Tuck matty@box.net.au
Tue, 01 Dec 1998 20:19:42 +1030


Ursula Dreier wrote:

> Basic data types - I as a user would not like to be bothered with
> implementation details unless really necessary. Have a look at
> Microsoft's 1001 string types - what a mess! So I would like to
> declare a string as a string (exactly like any other basic data type),
> without having to specify a code set, a length, a storage format or
> things like that. Same thing holds true with numbers - normally I
> would like to declare just a number which can take any value. If I
> really wanted to specify it to be an integral, I would like to be able
> to do so, of course. But the thing should behave exactly like
> any other number, apart from having a built-in truncation feature.

This is the gist of my multiple implementations proposal.  You can
comfortably have more than one implementation of a type, and either not
specify one, or specify one.  Some implementations could have a "guard",
which means they only work on a subset of situations, therefore being
more efficient, and you evaluate one implementation from left to right
until you find one which matches.

> Pointers - Programmers are so used to pointers that they can hardly
> think of how to do without. But consider Java - they don't have
> pointers, just arrays.

That really depends how you use pointers.  Pointers at the language
level are not needed for out parameters, they're not needed for
efficient parameter passing and any conversion to an "address" that can
be mathematically transformed should be carefully monitored.

Pointers are really only needed for sharing.  In most OO languages
today, pointers are implicit, that is, every variable contains a pointer
to an object.  Except, in some languages, the basic types, which often
have a different set of rules.

I'm not convinced sharing semantics is the best, and if not, if the
explicit pointers that would then be needed are a bad thing.  The main
problem is pointer overuse, pointer arithmetic and dangling pointers. 
None of these need to be a problem.

> And the language I used to write business apps in for a long time
> (SqlWindows, now Centura) doesn't have pointers either (but they have
> "receive parameters", which are in-out parameters equivalent to
> thing& in C++, window handles and arrays).

Is this in-out on the object value or the object?  In OO most languages
it is in only on the object, but you can alter the object at will.  If
you allow out mode then you're forced to pass in a supertype rather than
subtype for type safety reasons.  These rules are difficult to deal
with.  I think Sather does this.

> Pointers are dangerous. They tend to be uninitialized, leading to
> unwanted program behaviour.

Only if they aren't checked.

> Pointers are more difficult to declare and use than ordinary
> variables. Every time when it comes to declaring function pointers
> in C, I still have to think how to put it right. (There is no need to
> use the same awkward syntax in this project, of course).

Function parameters should generally be objects.  This allows
comfortably generating dynamic functions by attaching state to the
object at creation time.  The syntactic overhead can be dealt with sugar
and the run-time overhead with optimisation.  There was a discussion on
this a while back.

> USER to do without pointers wherever possible, or at least enforces
> using them in a safe way.

I think this isn't really a problem, C and cousins are the only
languages I know of that do this sort of thing, no-one else does.  A
language designed for writing operating systems and device drivers
shouldn't have ever been used for application development.

> If you declare an object without assigning it a value at that point, a
> new object will automatically be created and bound to that name. If

I don't really like this proposal.

Firstly, I like the fact that sharing semantics gives you an obvious
initial value.  Null says "no object", and if you forget to initialise
it it will bomb out, which is a good thing, not a bad thing.  It will be
detected early and fixed.

If you use initialisation to 0, and you don't think about
initialisation, who knows what might happen undetected.  Zero might
happen to be the most common initialisation value, but I don't see
there's no reason to assume it covers more than 50%.  And for some
types there might be no reasonable initial value.

Perhaps forcing to initialise to something, including null might work,
but that's a lot of overhead on every declaration.  It might be worth it
though, initialisation is a problem.  In these cases though you can find
programmers could tend to develop a reflex where they just write := 0.

> "flat" copy means: Only the top-level object is copied, not the
> objects being referenced. So if Person contains members, they are
> shared. If you change Smith's mail address, you also change Jones mail
> address because they actually are the same.

The shallow copy/deep copy problem is one which is at the heart of the
nature of imperative languages.  What worries me is a lack of support
for levels in between.  I don't recall ever running into too many of
these problems programming though, so probably need to be more
practical.

I think there should be an operator for object assignment and value
assignment.  Whether the value assignment should be shallow or deep, I'm
not sure.  If you've read any books on object-oriented design before you
might be familiar with the "has" and "using" relations.  Basically
pointers in a class can either represent something the object has
(owns), or merely something it uses (another distinct object).  I tend
to think it might be good policy to deep copy has objects and not copy
using objects. 

>   Function: new  ! Method specification
>    Description: The copy constructor
>    Parameters
>     aPerson: Template

I don't like the use of defining new procedures too much.  It's a bit of
a picky thing but most languages require you to generate objects via new
procedures - basically they create an object which you might initialise.

I prefer the Sather method of having a "new" expression that can only be
used within a class method.  The class method generates the object
instead.  Usually it will be a class-wide (C++ static) member.

> > Why can't it be a null (no object)?
> 
> I think it can be a NULL - (and we might be tempted to use it at least
> internally), but there are some serious drawbacks to this:Because NULL
> actually isn't an instance of the class in question, you can't apply
> methods to it (what should they do?) and you can't access instance
> items (there are none).

I think you can in Smalltalk, I've seen isnull methods in Object.  I see
the lack of being able to call methods on null as an advantage - it's a
rather clean type system compared to the alternatives.

> So there has to be some DYNAMIC checking for this case if it were
> allowed, and it would cause overhead and be prone to errors if not
> handled properly.

Yes, but optimisation can remove most of this.  Placing language
preconditions on the parameters helps too.  In fact we should encourage
preconditions as much as possible.  Also for things like stopping
sharing problems at the source.

> (Not all the checking can be done automatically - the user would still
> have to specify some action if a NULL occurred, say, as an argument).
> I would like to avoid runtime checking - it makes debugging so much
> harder than having the checks being performed at compile time.

If the compiler does flow analysis (as it should already be doing for
optimisation), emitting warnings is not hard.

This being said, I'm not against the idea of having variables which must
contain an object, even by default, as long as NULL is possible.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
                              ***
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/