VM builds still buggy

Wed Jul 28 13:55:43 PDT 2004

Hi Brian

Please see my comments below.

Jaco

----- Original Message ----- 
From: "Brian T. Rice" <water at tunes.org>
To: "Jaco van der Merwe" <jaco.lists at telkomsa.net>
Cc: "The Slate Environment Mailing List" <slate at tunes.org>
Sent: Tuesday, July 27, 2004 4:57 AM
Subject: Re: VM builds still buggy

> Jaco van der Merwe wrote:
>
> > Hi Lee
> >
> > I would like to offer a few suggestions that may help to catch these
elusive
> > bugs.
> >
> > One of the most useful techniques that we use is to write self-checking
> > code. Such code does all kinds of sanity checks. My rule has always been
to
> > fail as quickly as possible if something goes wrong, and this technique
is
> > very good at doing that. The easiest way to accomplish this is to make
> > liberal use of ASSERT-style macros to check pre-conditions, invariants,
> > post-conditions and other sanity checks wherever possible. Yes, it
carries a
> > performance penalty, but it can be compiled out in a release build. The
> > standard ASSERT macro that is supplied with Visual C has a very nice
> > feature. In a debug build a failed assertion launches directly into the
> > debugger at the point where the assertion failed. From that point
onwards it
> > is usually very easy to find the cause. Typically we define our own
> > assertion macros that use the Visual C macros when targeted for the
Windows
> > platform.
>
> One thing you can do is to add assert: statements into Pidgin code which
> will generate "assert();" macro calls in C. It'd be as such:
>
> [| :a :b |
>    assert: a + b > 0.
>    ...
> ].
>
> which would turn into (roughly):
> {
>    int a, b;
>    assert(a + b > 0);
>    ...
> };
>
> I did plan this, but it looks as though we didn't code it up. I'll add
> it to bootstrap/mobius/c/generator.slate and the same file in src/. To
> show how simple this is, and so you can try it immediately, just
> evaluate this method:
>
> g@(C SimpleGenerator traits) generateCFor: _@#assert: on: args
> [
>    C Syntax FunctionCall applying: #ASSERT
>      to: {g generateCFor: args second}
> ].
>
> I think this could be a good way to learn about and document the VM
> code, and can assist us by sending us .diff files to incorporate for
> these useful tests.
>

I'm not that familiar with the VM generation or Slate (yet), but the above
looks really cool. I didn't realise it would be that easy.

> > After all that arm waving, here are some specific suggestions:
> >
> > - I notice that many data structures, for example, the global
CurrentMemory
> > variable which is an instance of the ObjectHeap struct, are not
explicitly
> > initialised. At first glance it seems that a lazy style of
initialization is
> > used, especially for the tables/arrays contained in the structs. I would
> > suggest initialising all structs, and especially tables/array before
usage.
> > Don't just initialise everything to zero, but rather use initial values
that
> > would cause an immediate failure if used incorrectly, e.g. an array
element
> > one beyond the current end of the array. If an immediate failure cannot
be
> > induced, then at least use a value that would cause a failure as soon as
> > possible afterwards.
>
> That's an interesting idea, but I want to hear what Lee thinks first.
>

I have to add that this mechanism is a fallback or last resort to use if you
do not have the luxury of self-checking code. The preferred approach is
always to have code that is well instrumented with checks and that will
catch problems as quickly as possible, i.e., the FAIL_FAST approach. If you
don't have self-checking code this mechanism can be a godsend. Even when you
do have self-checking code this can be useful to catch those cases that were
not checked.

> > - The current style of the code is not very amenable to self-checking.
Why?
> > Because all the structs and array elements are accessed directly. I
would
> > strongly suggest to hide these element accessors behind functions, that
is,
> > to use encapsulation as far as possible. This will result in a very
> > object-oriented style of C coding where all operations on a struct are
> > performed via functions that take the struct pointer as its first
argument.
> > The same should apply to tables/arrays. These functions can all be
declared
> > as inline in order to avoid any performance penalties. However, the
major
> > benefit of this approach is that all the "methods" that are performing
> > operations on their associated structs can do as many sanity checks as
> > possible. For example, all array indexing can check for array bounds, or
> > struct accessors can check for usage of "uninitialised" values, etc.
>
> This is possible, since we already declare the structs and so forth
> abstractly as pidgin prototypes (so generating "wrapper" code would be
> transparent or "mode-driven"). Whether it's more worthwhile than just
> the assertions is debatable. I'd like to start with the assertions and
> then see how much more we need.
>

In my opinion there is a big difference between taking the approach of
instrumenting the current VM code with assertions only versus the
encapsulation/wrapper code approach instrumented with assertions. Here is a
simple example to illustrate my point. Let's say we have a global C int
array called "arr". If we take the encapsulation approach we will have
functions to read and write the elements of this array, for example,

    void SetElementAt( int index, int value)

and

    int GetElementAt( int index )

Also assume that all code that accesses/mutates this array must use these
functions, i.e., the array is treated like an object where its internal
representation is hidden and the only way to access this array is via
functions like above. Adding array bounds checking is very easy in this
case. The person repsonsible for writing the array functions can add the
bounds checking to all the element accessors and mutators. This needs to be
done only once and all users of this array object will receive the benefit
of the array bounds checking automagically because they are using the
functions and don't access the internals of the array directly.

Now, if we did not use the encapsulation approach, i.e., we are accessing
the array elements directly, and we want to instrument the code with array
bounds checking then it becomes much more laborious and error-prone.
Suddenly it is the responsibility of every client of the array to check for
array bounds before accessing the array elements directly. If the array is
accessed in many places in the code then the bounds checking needs to be
duplicated in many places. It is really easy to forget to add  the bounds
checking, or to get it wrong. You need only one case like that to cause a
hidden bug that may be very difficult to find. It also becomes a maintenance
nightmare. Changing the way bounds checking is done implies changing the
same code in hundreds of places. This also violates the DRY (Don't Repeat
Yourself) principle (from Pragmatic Programmers)

Since I'm not familiar with the Slate VM generation code, it might be the
case that the encapsulation/wrapping exists at the level of Slate before the
C code is generated. If that is the case then the assertions can be added
without violating the DRY principle. Then the VM generator also plays the
role of an inlining compiler. However, even if this is the case, I would
prefer the encapsulation approach at the C level. All the C wrapper
functions can be declared as inline and then the inlining will be done by
the C compiler. It makes understanding and debugging the C code of the VM
much easier. Also, there shouldn't be a performance penalty if the compiler
does its job well, which is the case with most modern compilers nowadays.

------------- Side note ------------

The above example of using functions for encapsulation should work well with
globals, for example like with the CurrentMemory struct variable of the VM.
However, for other structs that have many instances the encapsulation
approach should be used with a "self" pointer to the struct as the first
argument to the "methods".  Extending the array example above it should look
like the following:

    typedef struct { int [], int size; } Array;

    void ArraySetElementAt( Array* self, int index, int value );
    int ArrayGetElementAt( Array* self, int index );

This Array "object" can be safely used with all its pre-conditions,
post-conditions and invariants being checked/asserted in its own methods
without its clients having to worry about it, i.e., it is localised within
the Array "object".

 ------------End Side Note------------

> > - Another useful technique that can be used for more complicated data
> > structures is to run an integrity check whenever required. For example,
in a
> > heap memory structure the integrity of the heap can be checked whenever
new
> > memory is allocated or freed. This code can also be compiled out in a
> > release build. Whenever the heap becomes corrupted it will become
visible
> > very quickly. These system integrity checks can even be triggered
externally
> > if required.
>
> Can you give an example that doesn't yield itself to assert() calls? I
> suppose this all can be done by making a pidgin language addition that
> compiles a block into "#ifdef debug" bodies or somesuch.
>

A typical example is where the integrity check cannot be done with just one
or a few asserts. In this case a multi-line piece of code needs to be
written that does a fairly complicated integrity check. You need to be able
to compile these checks into or out of the code depending on whether it is a
debug build or not. The "#ifdef debug" block of code is the correct way to
handle this.

> > I believe that this style of coding will catch many of the errors that
may
> > still be lurking in the code. It's not a silver bullet and it won't
catch
> > all errors, but it usually catches a large percentage of them. This
approach
> > requires some effort, but the return on investment is big and I would
> > strongly suggest using it if the goal is production-quality and reliable
> > code.
>
> Yeah, these are good goals. The balance to attempt is to make sure that
> the source pidgin code stays lightweight and modular.
>

Agreed, and that is also why I favour the encapsulation approach which will
keep everything lighweight and modular.

> > I hope this helps.
> >
> > Regards
> > Jaco van der Merwe
>
>