Why NOT C(++) ...

Michael David WINIKOFF winikoff@mulga.cs.mu.OZ.AU
Tue, 20 Apr 93 11:23:30 EST
Previous message: MUSIC Specs 0.0 [jjl3] [djo5]
Next message: The LL language. [arf6]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Found this floating around.

Enjoy.

I'll hopefuly have something to say about Sather Real Soon Now.

(BTW -- tared and compressed distribution was about 4 Megs. 
Instalkled it takes up 23 Megs)

Michael 


--------------- CUT HERE --------------

[Note: This is a text-only copy of a technical report which was
originally prepared using Interleaf and distributed in Postscript
form.  In the process of converting this document to textual form
the title pages and some header/footer material, as well as
formatting information, have been lost.  The main content is,
however, preserved unchanged.]


The case against C


P.J. Moylan
Department of Electrical and
Computer Engineering
The University of Newcastle
N.S.W. 2308,
Australia

eepjm@wombat.newcastle.edu.au
Fax: +61 49 60 1712

Abstract

The programming language C has been in widespread use since the
early 1970s, and it is probably the language most widely used by
computer science professionals. The goal of this paper is to argue
that it is time to retire C in favour of a more modern language.
The choice of a programming language is often an emotional issue
which is not subject to rational discussion. Nevertheless it is
hoped to show here that there are good objective reasons why C is
not a good choice for large programming projects. These reasons are
related primarily to the issues of software readability and
programmer productivity.

Keywords: Programming languages, C, C++

Introduction

This note was written after I had found myself saying and writing
the same things over and over again to different people. Rather
than keep repeating myself, I thought I should summarize my
thoughts in a single document.

Although the title may sound frivolous, this is a serious document.
I am deeply concerned about the widespread use of C for serious
computer programming. The language has spread far beyond its
intended application area. Furthermore, the C enthusiasts seem to
be largely in ignorance of the advances which have been made in
language design in the last 20 years. The misplaced loyalty to C
is, in my opinion, just as serious a problem among professionals as
the BASIC problem is among amateurs.

It is not my intention in this note to debate the relative merits
of procedural (e.g. Pascal or C), functional (e.g. Lisp), and
declarative languages (e.g. Prolog). That is a separate issue. My
intention, rather, is to urge people using a procedural language to
give preference to high-level languages.

In what follows, I shall be using Modula-2 as an example of a
modern programming language. This is simply because it is a
language about which I can talk intelligently. I am not suggesting
that Modula-2 is perfect; but it at least serves to illustrate that
there are languages which do not have the failings of C.

I do not consider C++ to be one such language, by the way. The
question of C++ will be considered in a later section. For now, it
is worth pointing out that almost of the criticisms of C which will
be listed in this note apply equally well to C++. The C++ language
is of course an improvement on C, but it does not solve many of the
serious problems which C has.

Some background

The first C compiler, on a PDP-11, appeared in about 1972. At the
time the PDP-11 was a relatively new machine, and few programming
languages were available for it; the choice was essentially limited
to assembly language, BASIC, and Fortran IV. (Compilers and
interpreters for some other languages had been written, but were
not widely distributed.) Given the obvious limitations of these
languages for systems-level programming, there was a clear need for
a new language.

This was also the era in which software designers were coming to
accept that operating systems need not be written in assembly
language. The first version of Unix (1969-70) was written in
assembly language, but subsequently almost all of it was rewritten
in C. To make this feasible, however, it was necessary to have a
language which could bypass some of the safety checks which are
built in to most high-level languages, and allow one to do things
which could otherwise be done only in assembly or machine language.
This led to the concept of intermediate-level machine-oriented
languages.

C was not the only such language, and certainly not the earliest.
In fact, a whole rash of machine-oriented languages appeared at
about that time. (I was the author of one such language, SGL, which
was used for a number of projects within our department in the
1970s. It was retired, as being somewhat old-fashioned, in the
early 1980s.) These languages had a strong family resemblance to
one another; not because the authors were copying from one another
(in my own case, SGL had reached a fairly advanced stage before I
became aware of the existence of C), but because they were all
influenced by the same pool of ideas which were common property at
the time.

Why C became popular

The history of C is inextricably linked with the history of Unix.
The Unix operating system is itself written in C, as are the
majority of utility programs which come with Unix; and to the best
of my knowledge a C compiler comes with every distribution of Unix,
whereas it is harder to get compilers for other languages under
Unix. Thus, we need to look at the reasons for the rapid spread of
Unix.

The obvious reasons are cost and availability. Unix was distributed
at virtually no cost, and sources were available to make it easy to
port it to other systems. A number of useful utilities were
available within Unix - written in C, of course - and it was
usually simpler to leave them in C than to translate them to
another language. For a Unix user who wanted to do any programming,
a competence in C was almost essential.

Since then, C has remained widespread for the same reasons as why
Fortran has remained widespread: once a language has built up a
large user base, it develops an unstoppable momentum. When people
are asked "why do you use C?", the most common answers are (a) easy
availability of inexpensive compilers; (b) extensive subroutine
libraries and tools; (c) everyone else uses it. The ready
availability of compilers, libraries, and support tools is, of
course, a direct consequence of the large number of users. And, of
course, each generation of programming educators teaches students
its favourite language.

Portability is also given as a reason for the popularity of C, but
in my opinion this is a red herring. A subset of C is portable, but
it is almost impossible to convince programmers to stick to that
subset. The C compiler which I use can generate warning messages
concerning portability, but it is no effort at all to write a
non-portable program which generates no compiler warnings.

Why C remains popular

With advances in compiler technology, the original motivation for
designing medium-level languages - namely, object code efficiency -
has largely disappeared. Most other machine-oriented languages
which appeared at about the same time as C are now considered to be
obsolete. Why, then, has C survived?

There is of course a belief that C is more appealing to the "macho"
side of programmers, who enjoy the challenge of struggling with
obscure bugs and of finding obscure and tricky ways of doing things.

The conciseness of C code is also a popular feature. C programmers
seem to feel that being able to write a statement like

**p++^=q++=*r---s

is a major argument in favour of using C, since it saves
keystrokes. A cynic might suggest that the saving will be offset by
the need for additional comments, but a glance at some typical C
programs will show that comments are also considered to be a waste
of keystrokes, even among so-called professional programmers.

Another important factor is that initial program development is
perceived to be faster in C than in a more structured language. (I
don't agree with this belief, and will return later to this point.)
The general perception is that a lot of forward planning is
necessary in a language like Modula-2, whereas with C one can sit
down and start coding immediately, giving more immediate
gratification.

Do these reasons look familiar? Yes, they are almost identical to
the arguments which were being trotted out a few years ago in
favour of BASIC. Could it be that the current crop of C programmers
are the same people who were playing with toy computers as
adolescents? We said at the time that using BASIC as a first
language would create bad habits which would be very difficult to
eradicate. Now we're seeing the evidence of that.

Advances in language design

It would be a gargantuan task to track down and document the origin
of what we know today
about programming language design, and I'm
not going to do that. Many of the good ideas first appeared in
obscure languages, but did not become well-known until they were
adopted into more popular languages. What I want to do in this
section is simply note a few important landmarks, as they appeared
in the better-known languages.

Undoubtedly the most important step forward was the concept of a
high-level language, as exemplified in Fortran and Cobol. What
these languages gave us were at least three important new
principles: portability of programs across a range of machines; the
ability to tackle new problems which were just too big or too
difficult in assembly language; and the expansion of the pool of
potential programmers beyond that small group willing and able to
probe the obscure mysteries of how each individual processor worked.

Needless to say, there were those who felt that "real" programmers
would continue to work in machine language. Those "real"
programmers are still among us, and are still arguing that their
special skills and superior virtue somehow compensate for their
poor productivity.

The main faults of Fortran were a certain lack of regularity, some
awkward restrictions which in hindsight were seen to be
unnecessary, and some features which were imposed more because of
machine dependencies than for programmer convenience. (The only
justification for the three-way IF of Fortran II, for example, was
that it mapped well into the machine language of a machine which is
now obsolete.) Some of these faults were corrected in Algol 60,
which in turn inspired a large number of Algol-like languages. The
main conceptual advance in Algol was probably its introduction of
nesting in control structures, which in turn led to cleaner control
structures.

The structured programming revolution is sometimes considered to
date from Dijkstra's famous "GOTO considered harmful" letter.
Although this is an oversimplification, it is true that the
realisation that the GOTO construct was unnecessary - and even
undesirable - was an important part of the discovery that
programming productivity was very much linked to having
well-structured and readable programs. The effect this had on
language design was a new emphasis on "economy of concept"; that
is, on having languages which were regular in design and which
avoided special cases and baroque, hard to read constructs.

The important contribution of Pascal was to extend these ideas from
control structures to data structures. Although the various data
structuring mechanisms had existed in earlier languages - even C
has a way of declaring record structures - Pascal pulled them all
together in an integrated way.

Pascal can still be considered to be a viable language, with a
large number of users, but it has at least two conspicuous faults.
First, it was standardized too early, which meant that some
niggling shortcomings - the crude input/output arrangements, for
example - were never fixed in the language standard. They are fixed
in many implementations of Pascal, but the repairs go outside the
standard and are therefore nonportable. The second major fault is
that a Pascal program must (if one wants to conform to the
standard) exist as a single file, which makes the language
unsuitable for really large programs.

More recently, there has been a lot of emphasis on issues like
reusable software and efficient management of large programs. The
key idea here is modularity, and this will be discussed in the
following section.

Now, where does C fit into this picture? The answer is that C is
built around lessons which were learnt from Algol 60 and its early
successors, and that it does not incorporate much that has been
learnt since then. We have learnt some new things about language
design in the last 20 years, and we do know that some of the things
that seemed like a good idea at the time are in fact not such good
ideas.  Is it not time to move on to D, or even E?

Modularity

In its very crudest sense, modularity means being able to break a
large program into smaller, separately compiled sections. C allows
this. Even Fortran II allowed it. This, however, is not enough.

What modularity is really about is data encapsulation and
information hiding. The essential idea is that each module should
take care of a particular sort of data, and that there should be no
way of getting at that data except via the procedures provided by
that module. The implementation details of the data structures
should be hidden. There should be no way to call a procedure unless
the module explicitly exports that procedure. Most importantly,
callers of a module should not need to know anything about the
module except for the declarations and comments in its "visible"
section. It should be possible to develop a module without having
any knowledge of the internal structure of any other module.

The advantages should be obvious. At any given time a programmer
need only be concerned with a short section of program - typically
a few pages long - without having to worry about side-effects
elsewhere in the program. It is possible to work with complex data
structures without having to worry about their internal detail. It
is possible to replace a module with a newer version - and this
even includes the possibility of a complete overhaul of the way a
data structure is implemented - without having to alter or re-check
the other modules. In a team programming situation, the
coordination problems become a lot simpler.

If the hardware supports memory segmentation, then the data in each
module are protected from accidental damage by other modules
(except to the extent to which pointers are passed as procedure
parameters). This makes errors easier to detect and to fix. Even
without hardware protection the incidence of programming errors is
reduced, because error rates depend on program complexity, and a
module a few pages long is far less complex than a monolithic
hundred-page program.

Now, modular programming is possible in C, but only if the
programmer sticks to some fairly rigid rules:

-	Exactly one header file per module. The header should contain the
function prototypes and typedef declarations to be exported, and
nothing else (except comments).

-	The comments in a header file should be all that an external
caller needs to know about the module. There should never be any
need for writers to know anything about the module except what is
contained in the header file.

-	Every module must import its own header file, as a consistency
check.

-	Each module should contain #include lines for anything being
imported from another module, together with comments showing what
is being imported. The comments should be kept up-to-date. There
should be no reliance on hidden imports which occur as a
consequence of the nested #include lines which typically occur when
a header file needs to import a type definition or a constant from
elsewhere.

-	Function prototypes should not be used except in header files.
(This rule is needed because C has no mechanism for checking that a
function is implemented in the same module as its prototype; so
that the use of a prototype can mask a "missing function" error.)

-	Every global variable in a module, and every function other than
the functions exported via the header file, should be declared
static.

-	The compiler warning "function call without prototype" should be
enabled, and any warning should be treated as an error.

-	For each prototype given in a header file, the programmer should
check that a non-private (i.e. non-static, in the usual C
terminology) function with precisely the same name has its
implementation in the same module. (Unfortunately, the nature of
the C language makes an automatic check impossible.)

-	Any use of grep should be viewed with suspicion. If a prototype
is not in the obvious place, that's probably an error.

-	Ideally, programmers working in a team should not have access to
one another's source files. They should share only object modules
and header files.

Now, the obvious difficulty with these rules is that few people
will stick to them, because the compiler does not enforce them. A
great many people think of #include as a mechanism for hiding
information, rather than as a mechanism for exporting information.
(This is shown by the distressingly common practice of writing
header files without comments.) The counter-intuitive meaning of
static is a disincentive for using it properly. Function prototypes
tend to be thrown into a program in a haphazard way, rather being
confined to header files. Programmers who think of comments as
things to be added after writing the code will hardly accept the
discipline of keeping the comments on their #include lines
up-to-date. A good many programmers prefer not to enable warning
messages in their compilations, because it produces too many
distracting and "unimportant" messages. Finally, the notion of
having precisely one header file per module runs counter to
traditions which have been built up among the community of C users.

And, what is worse, it takes only one programmer in a team to break
the modularity of a project, and to force the rest of the team to
waste time with grep and with mysterious errors caused by
unexpected side-effects. I believe it is well known that almost
every programming team will include at least one bad programmer. A
modular programming language shields the good programmers from at
least part of the chaos caused by the bad programmers. C doesn't.

To complicate matters, it is easy even for good programmers to
violate, by accident, the rules for proper modularity. There is no
mechanism in C for enforcing the rule that every prototype
mentioned in a header file is matched by an implementation in the
same module, or even for checking that the function names in the
implementation module match those in the header file. It is easy to
forget to make internal functions private, since the default
behaviour is back to front: the default is to make all functions
exportable, whether or not a prototype is used. It is easy, too, to
lose track of what is being imported from where, because the
crucial information is locked away in comments which the compiler
doesn't check. The only way I know of for checking what is being
imported is to comment out the #include lines temporarily, to see
what error messages are produced.

In most modular programs, some or all modules will need an
initialization section. (You can't initialize data structures from
outside the module, since they aren't supposed to be visible from
outside the module.) This means that the main program of a C
program must arrange to call the initialization procedures in the
correct order. The correct order is bottom-up: if module MA depends
on module MB, then module MB must be initialized before module MA.
Any language supporting modular programming will work this out for
you, and perform the initialization in the correct order. (It will
also report circular dependencies, which in most cases reflect an
error in overall program design.) In C, you have to work this out
by hand, which can be a tedious job when there are more than about
a dozen modules. In practice, I have found that it is almost
impossible to avoid circular dependencies in a large C program,
whereas I have rarely struck such dependencies in Modula-2
programs. The reason is that the Modula-2 compiler/linker
combination catches circularities at an early stage, before it has
become too difficult to re-design the program. In C, such errors do
not show up until mysterious errors appear in the final testing.

The hazards of #include

I have often heard it said that the #include directive in C has
essentially the same functionality as the IMPORT of Modula-2 and
similar languages. In fact there is a profound difference, as I
shall now attempt to show.

Consider a header file m2.h which contains the lines

#include <m1.h>
	/* FROM m1 IMPORT stInfo */

void AddToQueue
(stInfo* p);

and suppose that several other modules contain a  #include <m2.h>.
Consider the following sequence of events, which could easily
happen in any programming project:

(a)	some of the modules which import from m2 are compiled;

(b)	as the result of a design change, the typedef defining  stInfo
in  m1.h  is altered;

(c)	the remaining modules which import from m2 are compiled.

At this point, the overall program is in an inconsistent state,
since some of the modules were compiled with an obsolete
definition; but the error will probably not be caught by the
compiler or linker. If you ever wondered why you keep having to do
a "Compile All" in order to eliminate a mysterious bug, this is
part of the reason.

The reason why this problem occurs in C, whereas it does not occur
in languages designed for modular programming, is that a C header
file is a pure text file, with no provision for containing a "last
compiled" time or other mechanism for consistency checking. (This
is also why C compilers appear to be painfully slow when compared
with, for example, a typical Modula-2 compiler. It usually takes
longer to read a header file than it does to read a symbol file.)

Another nasty consequence of reading the header file literally is
that information in the header file is treated as if it were in the
file which contains the #include. There is no "fire wall" around
the header file. Everything declared in one header file is
automatically exported, in effect, to the header files mentioned in
every following #include. This can lead to obscure errors which
depend on the order of the #include lines. It also means that the
effect of a header file is not under the full control of the person
who wrote it, since its behaviour depends on what comes before it
in the importing module.

Similar problems exist with other preprocessor directives, such as
#define. This point is not always fully understood: the effects of
a #define persist through an entire compilation, including any
included files. There is no way in C to declare a local literal
constant.

Have you ever had the experience of having the compiler report an
error in a library function you're not even calling, where the real
error turns out to be a misplaced semicolon in some completely
unrelated file? Such non-local effects make a mockery of modularity.

Another problem with #include is that it is an all-or-nothing
proposition. (This can be resolved by having multiple header files
per module; but that means putting the header files under the
control of the importer, not the exporter, which creates the risk
of undetected discrepancies between a module and its header
file(s). In any case, such a practice creates major headaches in
terms of book-keeping and naming conventions.) How many programmers
read the  whole of a header file before deciding to include it?
Very few, I suspect. The more likely situation is that the #include
imports some names which the importer doesn't know about. This can
be a disaster if, as sometimes happens, two functions happen to
have the same name and the same parameter types. (If you think this
is unlikely, just think of the obsolete versions of software which
are left around when you copy files from place to place.) The
compiler won't complain; it will simply assume you were in an
expansive mood and decided to write a function prototype twice. The
linker might complain, but you can't guarantee it.

As a result, it is possible to import a function which is different
from the function you thought you were importing, and there is not
necessarily any warning message. Part of the problem here is that
the mechanism by which the linker chooses which functions to link
in has no connection with the mechanism by which the compiler
checks function prototypes. There is no way of specifying that a
particular header file belongs to a particular module.

Note, too, that an unused prototype is never picked up as an error.
(This is another reason for insisting that prototypes be used only
in header files, and nowhere else. This does not solve the problem,
but it reduces the amount of manual checking which has to be done.)
While this will not cause a program to run incorrectly, it adds to
the confusion to be faced by future maintainers of the program.

The speed of program development

A claim that is often heard is that initial program development is
fast in C because it is easy to get to the point of the "first
clean compile". (This is also an argument which is popular with the
BASIC enthusiasts.) This property is contrasted with what happens
with languages like Modula-2, where - it is said - a lot of forward
planning is necessary before any progress is made on the coding.
The conclusion is that C programmers get more immediate feedback.

This argument is silly in at least three ways. First, the claim
relies at least partly on the fact that C compilers are more
generous in accepting doubtful code than are compilers for
higher-level languages. Where is the virtue in that? If the
compilation of code containing errors is seen as a significant step
forward, you can get that in any language. All you have to do is
ignore the error messages.

Second, the "first clean compile" is a fairly meaningless measure
of how far you have progressed. It might be a significant milestone
if you follow an approach to programming where the coding is not
started until most of the design work has been completed, but not
otherwise. Under the "code, then debug" philosophy of programming,
you still have most of your work ahead of you after the first
compilation.

Finally, my own experience is that even the original statement is
incorrect. I find that I reach the "first clean compile" stage
within the first few minutes of starting work. This is because I
prefer developing programs through stepwise refinement (also known
as top-down design combined with top-down coding). The very first
thing I compile consists of perhaps half a dozen lines of code,
plus a couple of dozen lines of comments. It is so short that it
will compile without errors either immediately, or after
discovering errors which are obvious and easy to repair.

What about the subdivision of a large program into modules? This
takes a lot less forward planning than is commonly supposed. With
stepwise refinement, and with the philosophy that the function of a
module is to look after a data type, one tends to discover what
modules are needed as the program development proceeds.
Furthermore, true modularity makes it very easy to construct and
test the program in stages, because of the property that changes in
the internal details of a module can be made independently of what
is happening outside that module.

In cases where I have kept a log of the time I have spent on a
project, I have found that I spend about twice the time to get a C
program working as to solve a problem of equivalent complexity
using Modula-2. The difference has nothing to do with typing speed
- since the source files tend to be of about the same length - but
in the time spent in debugging. In Modula-2, the job is essentially
complete once I have typed in the last module, and debuggers are
rarely needed. In C, a good debugger is indispensable.

To a project manager, this is a very important factor. In a big
project, the cost of paying the programmers is typically the
second-biggest budget item (after administrative overheads), and
sometimes even the biggest. A productivity difference of 50% can
make the difference between making a large profit or a large loss
on the project.

Pointers: the GOTO of data structures

Despite all the advances which have been made in the theory and
practice of data structures, pointers remain a thorn in everyone's
side. Some languages (e.g. Fortran, Lisp) manage to get by without
explicit pointers, but at the cost of complicating the
representation of some data structures. (In Fortran, for example,
you have to simulate everything using arrays.) For anyone working
with almost any reasonably advanced application, it is hard to
avoid the use of pointers.

This does not mean that we have to like them. Pointers are
responsible for a significant amount of the time spent on program
debugging, and a large proportion of the complexity which makes
program development difficult. A major challenge for software
designers in languages like Modula-2 is to restrict the pointer
operations to the low-level modules, so that people working with
the software don't have to deal with them. A major, and largely
unsolved, problem for language designers is to find mechanisms
which save programmers the trouble of having to use pointers.

Having said that, one can also say that a distinction can be drawn
between essential and inessential pointers. An essential pointer,
in the present context, is a pointer which is required in order to
create and maintain a data structure. For example, a pointer is
needed to link a queue element to its successor. (The language
might or might not explicitly call it a pointer, but that is a
separate issue. Whatever the language, there must be some way of
implementing the "find successor" operation.) An inessential
pointer is one which is not needed as part of implementing a data
structure.

In a typical C program, the inessential pointers outnumber the
essential pointers by a significant amount. There are two reasons
for this. The first is that C traditions encourage programmers to
create pointers even where equally good access methods already
exist; for example, for stepping through the elements of an array.
(Should we blame the language for the persistence of this bad
habit? I don't know; I simply note that it is more prevalent among
C programmers than among those who prefer other languages.)

The second reason is the C rule that all function parameters must
be passed by value. When you need the equivalent of a Pascal VAR
parameter or an Ada inout parameter, the only solution is to pass a
pointer. This is a major contributor to the unreadability of C
programs. (To be fair, it should be admitted that C++ does at least
provide a solution for this problem.)

The situation worsens when it becomes necessary to pass an
essential pointer as an inout parameter. In this case, a pointer to
a pointer must be passed to the function, which is confusing for
even the most experienced programmers.

Execution-time efficiency

There appears to be a widespread belief among C programmers that -
because the language is close to machine language - a C program
will produce more efficient object code than an equivalent program
written in a high-level language.

I'm not aware of any detailed study of this question, but I have
seen the results of a few informal studies comparing Modula-2 and C
compilers. The results were that the code produced by the Modula-2
compilers was faster and more compact than that produced by the C
compilers. This should not be taken as a definitive answer, since
the studies were not extensive enough, but it does indicate that C
programs might not be as efficient as is generally thought.

I believe I've also seen claims - although I can't recall the
details at this distance in time - that C compilers produced better
code than an assembly language programmer did. I observed a similar
phenomenon when testing my SGL compiler many years ago. The reason
in that case seemed to be that the compiler did a reasonably good
job on things like register allocation, whereas one can suffer from
lapses of concentration when having to concentrate too much on the
fine detail.

The general rule seems to be that a high-level language compiler
will out-perform a lower-level language compiler, mainly because
the high-level language compiler has more scope for making
decisions about how to generate the code. If you do things in C
like setting up a pointer to an array rather than using subscripts,
you are taking that decision away from the compiler. Your approach
might produce more efficient code, but to be sure of that you have
to know quite a lot about the instruction timings on the machine
you are using, and about the code generation strategies of your
compiler. In addition, the decision is a non-portable one,
potentially leading to major inefficiencies if you switch to
another machine or another version of the compiler.

More importantly, the speed of a program tends to depend more on
the global strategies adopted - what sort of data structures to
use, what sorting algorithms to use, and so on - than the
micro-efficiency issues related to precisely how each line of code
is written. When working in a low-level language like C, it becomes
harder to keep track of the global issues.

It is true that C compilers produced better code, in many cases,
than the Fortran compilers of the early 1970s. This was because of
the very close relationship between the C language and PDP-11
assembly language. (Constructs like *p++ in C have the same
justification as the three-way IF of Fortran II: they exploit a
special feature of the instruction set architecture of one
particular processor.) If your processor is not a PDP-11, this
advantage is lost.

What about C++?

The language C++ is supposed to overcome some of the faults of C,
and to a certain extent it does this. It does, however, have two
major drawbacks. It is much more complex than it needs to be, which
can lead programmers either to ignore the extended features or to
use them in an inappropriate way. The second problem is that the
language tries to maintain compatibility with C, and in so doing
retains most of the unsafe features.

This second problem means that most of the faults discussed in
earlier sections are also faults of C++. Type checking is still
minimal, and programmers are still permitted to produce weird and
baroque constructs which are hard to read. Strangest of all, there
is still no support for modularity beyond the crude #include
mechanism. This is a little surprising: modular programming and
object-oriented programming complement each other very nicely, and
given all the effort that the designers of C++ must have had to put
into the object-oriented extensions it is rather disappointing that
they did not put in that slight extra effort which could have
resulted in a major improvement to the language.

Some of the features related to object-oriented programming are
complicated, and open to misuse by programmers who do not fully
understand them. For safety, I would prefer to see programmers
learn object-oriented programming using a cleaner implementation
(e.g. Smalltalk, Modula-3) before being let loose on C++.

The ability to pass function parameters by reference is a definite
bonus, but the mechanism chosen for doing this is unnecessarily
messy. The only motivation I can see for implementing it this way
is to satisfy the Fortran programmers who miss the EQUIVALENCE
construct.

Operator and function overloading is a mixed blessing. In the hands
of a competent programmer it can be a major virtue; but when used
by a sloppy programmer it could cause chaos. I would feel happier
about this feature if C++ compilers had some way to detect sloppy
programmers.

If there is a discrepancy between a function and its prototype, is
this an error, or is it a deliberate overloading? Most commonly it
will be an error, but in some such cases the C++ compiler will make
the optimistic assumption. One has to be a little suspicious of a
language improvement which increases the probability of undetected
errors.

I'm not yet sure how to feel about multiple inheritance. It is
powerful, in the same way that goto is powerful, but is it the sort
of power we want? I have a nagging suspicion that at some time in
the future our guidelines for "clean programming" will include a
rule that object inheritance should always be restricted to single
inheritance. However, I'm prepared to admit that the evidence is
not yet in on this question.

In brief, C++ introduces some new problems without really solving
the original problems. The designers have opted to continue with
the C tradition that "almost everything should be legal". In my
view, this was a mistake.

Libraries vs. language features

One of the popular features of C++ is the large set of library
functions which is usually distributed with it. This is, indeed, a
desirable feature, but it should not be confused with the inherent
properties of the language. Good libraries can be written for any
language; and in any case most reasonable compilers allow one to
call "foreign" procedures written in other languages.

More generally, people sometimes say they like C because they like
things like argc and argv, printf, and so on. (I don't - I've had
so much trouble with printf, sscanf, and the like that I've been
forced into writing alternative I/O formatting functions - but
that's a separate issue.) In many cases, the functions they like,
and point to as examples of "portable C" are peculiar to one
particular compiler, and not even mentioned in whichever C standard
they consider to be the standard standard. The desirability or
otherwise of various library routines is a legitimate subject for
debate, but it is an issue separate from that of language
properties.

There is just one way in which these functions differ as a result
of genuine language differences from the procedures available with
other languages, and that is the C rule which permits functions
with variable numbers and types of parameters. While this feature
does have certain advantages, it necessarily involves a relaxation
of type checking by the compiler. I personally have wasted hours of
valuable debugging time over things like printing out a long int
with a format appropriate to an int, and then not being able to
discover why my computations were producing the wrong value. It
would have been much faster, even if slightly more verbose, to call
type-safe procedures.

Concluding remarks

Nothing in this document should be interpreted as a criticism of
the original designers of C. I happen to believe that the language
was an excellent invention for its time. I am simply suggesting
that there have been some advances in the art and science of
software design since that time, and that we ought to be taking
advantage of them.

I am not so naive as to expect that diatribes such as this will
cause the language to die out. Loyalty to a language is very
largely an emotional issue which is not subject to rational debate.
I would hope, however, that I can convince at least some people to
re-think their positions.

I recognise, too, that factors other than the inherent quality of a
language can be important. Compiler availability is one such
factor. Re-use of existing software is another; it can dictate the
continued use of a language even when it is clearly not the best
choice on other grounds. (Indeed, I continue to use the language
myself for some projects, mainly for this reason.) What we need to
guard against, however, is making inappropriate choices through
simple inertia.
Previous message: MUSIC Specs 0.0 [jjl3] [djo5]
Next message: The LL language. [arf6]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]