VM builds still buggy

Mon Jul 26 19:57:53 PDT 2004

Jaco van der Merwe wrote:

> Hi Lee
> 
> I would like to offer a few suggestions that may help to catch these elusive
> bugs.
> 
> One of the most useful techniques that we use is to write self-checking
> code. Such code does all kinds of sanity checks. My rule has always been to
> fail as quickly as possible if something goes wrong, and this technique is
> very good at doing that. The easiest way to accomplish this is to make
> liberal use of ASSERT-style macros to check pre-conditions, invariants,
> post-conditions and other sanity checks wherever possible. Yes, it carries a
> performance penalty, but it can be compiled out in a release build. The
> standard ASSERT macro that is supplied with Visual C has a very nice
> feature. In a debug build a failed assertion launches directly into the
> debugger at the point where the assertion failed. From that point onwards it
> is usually very easy to find the cause. Typically we define our own
> assertion macros that use the Visual C macros when targeted for the Windows
> platform.

One thing you can do is to add assert: statements into Pidgin code which 
will generate "assert();" macro calls in C. It'd be as such:

[| :a :b |
   assert: a + b > 0.
   ...
].

which would turn into (roughly):
{
   int a, b;
   assert(a + b > 0);
   ...
};

I did plan this, but it looks as though we didn't code it up. I'll add 
it to bootstrap/mobius/c/generator.slate and the same file in src/. To 
show how simple this is, and so you can try it immediately, just 
evaluate this method:

g@(C SimpleGenerator traits) generateCFor: _@#assert: on: args
[
   C Syntax FunctionCall applying: #ASSERT
     to: {g generateCFor: args second}
].

I think this could be a good way to learn about and document the VM 
code, and can assist us by sending us .diff files to incorporate for 
these useful tests.

> After all that arm waving, here are some specific suggestions:
> 
> - I notice that many data structures, for example, the global CurrentMemory
> variable which is an instance of the ObjectHeap struct, are not explicitly
> initialised. At first glance it seems that a lazy style of initialization is
> used, especially for the tables/arrays contained in the structs. I would
> suggest initialising all structs, and especially tables/array before usage.
> Don't just initialise everything to zero, but rather use initial values that
> would cause an immediate failure if used incorrectly, e.g. an array element
> one beyond the current end of the array. If an immediate failure cannot be
> induced, then at least use a value that would cause a failure as soon as
> possible afterwards.

That's an interesting idea, but I want to hear what Lee thinks first.

> - The current style of the code is not very amenable to self-checking. Why?
> Because all the structs and array elements are accessed directly. I would
> strongly suggest to hide these element accessors behind functions, that is,
> to use encapsulation as far as possible. This will result in a very
> object-oriented style of C coding where all operations on a struct are
> performed via functions that take the struct pointer as its first argument.
> The same should apply to tables/arrays. These functions can all be declared
> as inline in order to avoid any performance penalties. However, the major
> benefit of this approach is that all the "methods" that are performing
> operations on their associated structs can do as many sanity checks as
> possible. For example, all array indexing can check for array bounds, or
> struct accessors can check for usage of "uninitialised" values, etc.

This is possible, since we already declare the structs and so forth 
abstractly as pidgin prototypes (so generating "wrapper" code would be 
transparent or "mode-driven"). Whether it's more worthwhile than just 
the assertions is debatable. I'd like to start with the assertions and 
then see how much more we need.

> - Another useful technique that can be used for more complicated data
> structures is to run an integrity check whenever required. For example, in a
> heap memory structure the integrity of the heap can be checked whenever new
> memory is allocated or freed. This code can also be compiled out in a
> release build. Whenever the heap becomes corrupted it will become visible
> very quickly. These system integrity checks can even be triggered externally
> if required.

Can you give an example that doesn't yield itself to assert() calls? I 
suppose this all can be done by making a pidgin language addition that 
compiles a block into "#ifdef debug" bodies or somesuch.

> I believe that this style of coding will catch many of the errors that may
> still be lurking in the code. It's not a silver bullet and it won't catch
> all errors, but it usually catches a large percentage of them. This approach
> requires some effort, but the return on investment is big and I would
> strongly suggest using it if the goal is production-quality and reliable
> code.

Yeah, these are good goals. The balance to attempt is to make sure that 
the source pidgin code stays lightweight and modular.

> I hope this helps.
> 
> Regards
> Jaco van der Merwe

-------------- next part --------------
A non-text attachment was scrubbed...
Name: water.vcf
Type: text/x-vcard
Size: 208 bytes
Desc: not available
Url : /archives/slate/attachments/20040726/b021da74/water.vcf