Parameterised Types

Matthew Tuck matty@box.net.au
Tue, 16 Mar 1999 21:44:21 +1030
Previous message: Language Syntax Suggestion
Next message: Parameterised Types
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
This message is about what I think about parameterised types (generics),
specifically about their type parameters.

Background
----------

Parameterised types are types that have members that can vary from
object to object, but are fixed for the lifetime of the object.  The
canonical example of a parameterised type is a collection, which works
something like this (using a list collection):


type List <Element: class>
   proc AddTail( X: ro Element )
   ...

...

a: List <Integer>


indicating that we could have a list of integers, a list of objects or
anything else.  Parameterised types are strictly unnecessary in any
language which has variables which can hold all types, for example the
'Object' type in most OOPLs, eg


type List
   proc AddTail( X: ro Object )
   proc RemoveHead( X: wo Object )
   func PeekHead -> Object
   ...

...

a: List


but this defines a list of Objects rather than a list of Integers or
some other type.  The essential different is one of type safety.  The
former list, if specified to be an integer list, does not allow you to
add strings, whereas the latter does.  Note that the parameterised type
can still be made to accept any objects if desired by being
parameterised by the Object type.

Parameterised types also prevent unnecessary downwards conversions
(run-time type conversions) from cluttering the program.  For example,
both RemoveHead and PeekHead in the latter List always return Objects,
even if they only contain Integers.  Hence to use the Integers, we must
downward convert the Object to an Integer.  In general this requires a
check to ensure it really is an integer, and many compilers might not
optimise this check out (since it is redundant in this hypothetical
situation).

So parameterised types are safer, reduce program clutter and possibly
make the program run faster.

This background has covered parameterising types with other types,
however you can also parameterise by normal data.  For example, a
dynamic array (an array whose bounds are not fixed at compile-time but
are at allocation time), would be parameterised by either 1 (eg C) or 2
(eg Ada) index bounds, plus indexed by an element type.

An array can therefore be considered to be a parameterised type, albeit
usually hardcoded into the language whther or not there are
parameterised
types.

Therefore, it makes sense to allow parameterisation for arrays if and
only if we allow parameterisation for user-defined types.  Most
programmers would not want to declare an array without being able to
restrict what is put in it.  Therefore the same should hold true of any
other data structure.

The array's priveleged place in languages was originally implementation
simplicity and is now fairly historic.  Hence it should be relegated
back to being just one tool in a programmer's library, along with
alternative types such as queues, stacks, relations, etc.

Since arrays are/almost are describable via parameterised types, we
should ensure our parameterised type system is strong enough to not have
to hardcode them.

Anyway, I have only described parameterising by type, since that is what
this message is mainly concerned with.

Client Type Parameters
=---------------------

A "client type parameter" is your standard type parameter - it is
specified by the client of the type.  For example:

a: list <Integer>

to declare a list of integers.  These declarations are fairly
uninteresting - the actual type is more interesting.

---

type List <Element>
   ...


this introduces a type parameter called "Element" that is supplied by
the client.  Hence the client can create a list of integers only,
objects only, etc.  This is sometimes known as "unconstrained
genericism".

---

type List <Element> : Element < Integer
   ...

this has a constraint that specifies that the element must be a subtype
of Integer.  This doesn't just say the list must contain integers - it
says that the restriction placed on the elements must be at least that
they may only be integers, but may still be restricting further to a
subtype.

The benefit of this is that since the list knows it must contain
integers, it may use the methods of Integer rather than just those of
Object (without a conversion).  Of course to do this we must first
restrict the members to actually be Integers.  This is sometimes known
as "constrained genericism".

The "<" represents a subtyping relation, ie the element restriction type
must be a subtype of integer.  Since subtyping sets up a partial order,
less than is a reasonable symbol to use.  You can include multiple
constraints (say on two different type parameters), separated by colons.

---

Taking the subtyping relationship further reveals some possibilities not
usually present in parameterised type systems.

type Assign <Element1, Element2> : Element1 < Element2
   ...

Here we have two type parameters from the client, and the constraint
consists of both types.  This allows us to assign an Element1 object
to an Element2 object variable for example, which we could not otherwise
do.

---

type Assign <A, B> : A = B

We could add a "=" shorthand to also allow assigning the opposite way. 
This would be useful if we wanted to swap elements, say.  The expansion
would be "A < B : B < A".

I'm deliberately not adding ">" to try and keep it more readable, eg
"B > A : A < C" is less understandable than " A < B : A < C".

---

type Assign <A, B, C, D, E, F> : A, B < C, D, E < F

An illustration of a complicated constraint unlikely to be seen in
practice.  I've done two things here - allow multiple "<" to be chained
with the meaning the same as in maths, and allowed multiple parameters
in between the "<"s.  The meaning of this is that every element left of
"<" is a subtype of every element right of "<".

Whether these would be useful in practice is a question to be asked -
while unlikely to be in this form, simple examples like "A < B < C" and
"A, B < Integer" might occur.

These are both shorthands of course.  The multiple less-thans shorthand
is easily representable as separate declarations - programmers do this
all the time with normal expressions.

The comma shorthand is more difficult.  It would be difficult to handle
as a shorthand since its expansion is potentially exponential - hence
collapsing especially would be difficult, but not necessarily
impossible.

Subtype Type Parameters
-----------------------

A "subtype type parameter" is specified by the subtype of the type
rather than the client.

Since the type can still be instantiated, there is a current value for
this type for use by clients.

Subtype type parameters are a solution to the problems of covariance and
contravariance.  The classical example of this is as follows:

type Animal
   proc Eat(X: ro Food)

type Cow
   proc Eat(X: ro Grass)

We expect a cow to be a subtype of Animal, yet we expect it to only eat
grass.  But this is not type safe since if we had an animal variable and
it happened to contain a cow, we might try and feed it an apple since
the
signature of Animal.Eat expects any food.

Languages take different positions on this.  Most don't allow the type
to change (invariance), Eiffel allows it to be a subtype (covariance)
and takes various steps to prevent the problems, Sather allows it  to be
a supertype (contravariance) since it is type safe, even if it doesn't
really make sense.

Note that data is only going into the procedures here, since the
parameters are read-only.  If they were out-only (e.g. write-only
parameters, return values), then covariance would be safe and
contravariance unsafe.

I believe the trick to best solving this is in using invariance, and
considering the Animal.Eat signature to be saying all Animals will eat
all Food.  Then we try to find another way to achieve covariance.

That way is through subtype type parameters.  We need a way for the type
to include the type of food an animal eats.  We could theoretically do
this with a client type parameter, however that information should not
be for the client to specify - it is for the subtype to specify.

As an example:

type Animal <FoodType=> : FoodType < Food
  proc Eat(X: ro FoodType) 

type Cow <FoodType=Grass> : FoodType < Grass
  ...

here the = represents the fact that the parameter is a subtype parameter
- we could mix it with client parameters by separating them with
commas.  We do not have to provide a default value for foodtype since
Animal is an abstract type (for this example).

Here we have defined FoodType to be covariant since it can be a FoodType
parameters is restricted to Food or any subtype.  By swapping the two
(Food < FoodType) we could define contravariance.

Notice that Cow further restricts the FoodType to Grass or a subtype and
furthermore, it specifies that cow objects have a food type of Grass. 
These two need not be the same, although they usually would be.  If we
used "FoodType = Grass" as the constraint, subbtypes of Cow would be
forced to have a FoodType of grass and be invariant.  In this way you
can choose any policy you like.

This policy essentially dictates a type designer must choose
specifically what is to become covariant, and dictates a certain element
of control.

It is important to recognise generic parameters and generic constraints
are inherited.  Hence they must be consistent - e.g. "FoodType < Grass"
is consistent with "FoodType < Food".  This is true for both times of
type parameter.

Summary
-------

So you can see how generic constraints can be used in languages to
achieve useful type safety.  An inadequate parameterised type system can
force a return to using only Object variables, which is to be avoided.

Other than (probably) some small bits, this scheme is not mine, but the
subtype type parameters have not been widely implemented yet.  I'm not
aware of a parameterised type system allowing a "A < B" constraint where
A and B are both parameters.

Hopefully it seemed that these two types of type parameter are similar
in feature - the essential difference is in how and where the type
parameter is specified - constraint mechanisms are shared and hence
should only need to be implemented once.

Perhaps my syntax could be improved - for example, merging the client's
parameter specification mechanism with that of the subtypes' to reflect
their similarity better.

And furthermore, it should be noted an implementation of
parameterisation need not result in new code being generated for each
parameterisation, like would occur with C templates.

-- 
     Matthew Tuck - Software Developer & All-Round Nice Guy
             mailto:matty@box.net.au (ICQ #8125618)
       Check out the Ultra programming language project!
              http://www.box.net.au/~matty/ultra/
Previous message: Language Syntax Suggestion
Next message: Parameterised Types
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]