[gclist] write barrier

Skubiszewski Marcin Marcin.Skubiszewski@inria.fr
Fri, 8 Mar 1996 11:02:03 +0100 (MET)


> what's a "write barrier" and what role does it play in a GC?

Here comes a piece of text from an article I am finishing. It is not
objective, and essentially stresses the problems with barriers,
because my article proposes a technique to replace barriers in
database GCs.

I know that this text is not completely adequate to your question, but
alas I don't have the time to write better stuff.


Unsophisticated GCs block all other activities for the time of a
complete GC cycle, because they are built upon the simplistic idea
that in order to view the system consistently, we must prevent it from
being modified while we examine it. The modern, more sophisticated GCs
can execute concurrently with the mutator. This implies that the
system may be modified at any time while the GC is analyzing it. The
classical principle for solving this concurrency problem is known as
"write barriers".

[Footnote: The first algorithm based on write barriers was introduced
by Dijkstra et al. \citeyear{gc:rep:1268}. A comprehensive survey of
concurrent garbage collection techniques by Wilson
\citeyear{gc:svy:1085} includes a detailed discussion of write
barriers. Amsaleg et al. \citeyear{amsaleg:95} developped a database
garbage collector which uses write barriers.]

According to this principle, the GC makes no effort to obtain a
consistent view of the system: each object is seen in the state in
which it happens to be when the GC looks at it. Instead, mutator's
code is instrumented so as to inform the GC of every pointer
modification performed while the GC is running. This information
allows the GC to build a list of objects which were reachable at some
point during the garbage detection process, yet which might be
improperly seen as unreachable. The GC considers all objects in the
list as reachable; this is sufficient to ensure correctness.

Unfortunately, this form of mutator-collector communication is
undesirable in many systems. Let us mention here two problems. First,
the mutator-collector communication seriously increases the cost of
every pointer modification performed by the mutator. In fact, the
communication is more resource-consuming than the actual update of a
pointer, which amounts to an ordinary memory write. Some overhead
remains even while the GC is not running (and thus no actual
communication occurs): whenever a pointer is modified, the mutator
needs at least to check whether a GC is running. Second, the need to
have mutator code instrumented makes the mutator-collector
communication incompatible with existing executable code. In many
systems, even user code needs to be compiled in a special,
GC-compatible way.


[...]


[Two desirable properties of a concurrent GC] are that the existence
of the GC has no effect upon the performance of the system while the
GC is not running, and that the GC does not require user code to be
instrumented. These properties may seem obvious, but in fact they are
not satisfied by most concurrent GCs. Especially, they are never
satisfied by concurrent GCs in systems with no logging (that is, in
systems other than databases). In such systems, concurrent GCs use
"write barriers", a mechanism by which the mutator informs the
collector of all pointer modifications performed while the collector
is running.  The write barrier is implemented by extra code inserted
either before or after every instruction in the mutator that modifies
a pointer.  For example, with the simplest variant of the write
barrier, known as {\em snapshot at the beginning}, while the GC is in
operation, the user instruction

  obj->p = q;

may only be executed once the old value of  obj->p  has been sent
to the GC. Thus, the instruction is replaced with the following code.

  if (markingInProgress)
      notifyGC (obj->p);
  obj->p = q;

where the boolean markingInProgress is true while the GC is marking
and false otherwise, and the function  notifyGC  sends its
argument to the GC.

This example shows that with write barriers, user code must be
instrumented in a GC-specific way. This mandates the use of
GC-specific compilers or system libraries, and makes the GC hard to
integrate with existing systems. Moreover, the instrumented code is
slower than ordinary code, even while the GC is not in operation,
ie. while  markingInProgress  is 0.

In a database system, a write barrier may be implemented without
instrumenting user code: the information about pointer modifications
is in the log, and can be read from there by the GC. This approach is
used by Amsaleg et al. \citeyear{amsaleg:95}. But even with this
approach, the existence of the GC has some negative impact on the
performance of the system while the GC does not run. This is due to
the fact that the log needs to be examined all the time, not just
while the GC is running. Measurements performed in various setups by
Amsaleg et al. show an overhead of 0.6% to 5.8%.


@Article{gc:rep:1268,
  author = 	 "Dijkstra, E. W. and Lamport, L. and Martin, A. J.
		  and Scholten, C. S. and Steffens, E. F. M.",
  title = 	 "On-the-Fly Garbage Collection: an Exercise in Cooperation",
  journal =	 cacm,
  year =	 1978,
  volume =	 21,
  number =	 11,
  pages =	 "966--975",
  month =	 nov
}


@InProceedings{gc:svy:1085,
  author = 	 "Wilson, Paul R.",
  title = 	 "Uniprocessor Garbage Collection Techniques",
  number = 	 637,
  series = 	 "Lecture Notes in Computer Science",
  booktitle = iwmm,
  year = 	 1992,
  publisher = "Springer-Verlag",
  address = 	 "Saint-Malo (France)",
  month = 	 sep
}


@INPROCEEDINGS{amsaleg:95,
  AUTHOR = {Laurent Amsaleg and Michael Franklin and
		  Olivier Gruber},
  TITLE = {Efficient Incremental Garbage Collection for
		  Client-server Object Database Systems},
  BOOKTITLE = "Proceedings of the 21th VLDB International Conference",
  year = 1995,
  month = sep,
  address =      "Zurich, Switzerland"
}