[gclist] Finalization and object orientation.

Zhong Shao shao-zhong@CS.YALE.EDU
Tue, 1 Apr 1997 18:48:09 -0500 (EST)


|>> > My impression from Andrew Appel's group is that maintaining
|>> > asymptotic space complexity (with the CPS definition of reachability) 
|>> > is neither free nor intolerably expensive.  It does inhibit some 
|>> > optimizations, and make it much harder to get the compiler right.
|>> 
|>> That makes sence to me.  Any references?
|>
|> I think his book says a little about that.  My information comes mainly from
|> Zhong Shao.
|>
|> Hans

This so-called "safe-for-space-complexity" definition of reachability
has been implemented in the SML/NJ compiler for almost three years 
(see [1] for details). Many people view it as an *optimization* to 
eliminate certain kinds of space leaks, we often think it as 
a crucial property that any industrial-strength compilers (for functional
languages) must satisfy. 

Consider the following ML function (simulating N phases of a typical 
compiler):

      fun compile (src) =
        let val codeInIL1 = phase1(src)
            val codeInIL2 = phase2(codeInIL1)
            ......
            val codeInILN = phaseN(codeInILNm1)

         in codeInILN
        end

If one writes the above code in C, he/she would definitely do an explicit 
"free" of each intermediate data structure after each phase is done. In 
most garbage-collected languages, the programmer don't have this kind 
of options; so he would think that the compiler is going to take care of 
that.  

Unfortunately, many compilers treat "compile" just as a C-like simple 
subroutine. All the local variables (assuming they are all pointers to 
some heap-allocated objects): src, codeInIL1, codeInIL2, ..., are 
allocated in the same stack frame, and they are all considered to be
*live* until the end of the phaseN (when we pop off the stack frame for 
"compile"). This is unacceptable, and the older versions of the SML/NJ
compiler have run into many problems like this.

What we concluded is the following: for languages that make pervasive uses
of "functions" and/or "closures", a new scoping rule must be used. Each 
local variable should be considered "dead" after its *last* use in the 
current function body. By "dead", I really mean "not contributing to the 
liveness of the data structure it points to".

Supporting this new kind of "liveness" definition is very important, 
because functions like the "compile" are quite common, especially in 
large software. 

This new scoping rule is less crucial for C-like languages because
large data structures are often explicitly allocated and de-allocated,
so the programmer takes over the responsibility of "free"-ing these 
things.

-Zhong Shao
(shao-zhong@cs.yale.edu)

PS. References:

  [1] Z. Shao and A. Appel. "Space-Efficient Closure Representations"
      in 1994 ACM Conference on Lisp and Functional Programming, also 
      available at the following website:

          http://www.cs.yale.edu/users/shao-zhong/papers.html

      The corresponding chapter in my Ph.D. thesis gives more a detailed 
      explanation.