[gclist] memory protections and system calls

David Chase chase@centerline.com
Mon, 01 Jul 96 10:44:28 -0400


> From: wilson@cs.utexas.edu (Paul R. Wilson)

> OOPS, I was running two problems together.  The thing that crashes
> SunOS is actually register window flushes (not explicit system calls)
> into protected pages of the stack segment.  The register window overflow
> code can't cope with protected pages.

> Apparently this can be fixed, but Sun hasn't done it.

That's not entirely true.  In order to deal with the case that you 
describe, you must install an alternate-stack signal handler for 
the appropriate signals (SIGSEGV) and check the faulting address 
to see if it lies within a stack.  If the handler then maps in the 
faulting page, and returns from interrupt, then the OS is supposed 
to retry the instruction after spilling register windows into the 
newly mapped stack.  This is what is *supposed* to happen; I have 
seen the code with my own eyes, and you can see artifacts of it in 
the mumble_context structures passed in to signal handlers. It is 
even supposed to work if you map some of the pages needed to spill 
register windows, but not all -- it refaults, you map another page, 
and it tries again when you return again, repeat until all windows
can spill.

In fact, this code did not work under Solaris 2.1, 2.2, or 2.3beta; 
I do not know about later versions.  It also superficially appears 
to work under most recent versions of SunOS (up through 4.1.3_U1beta), 
but in fact something goes subtlely wrong, and eventually it will 
fail, except in the case of a SuperSparc-based machine running an 
up-to-date (properly patched) version of 4.1.3 or 4.1.3_U1 (other 
later versions of the OS may work on more platforms -- I have not 
tried it there).

As if you could not tell by now, there is a bug filed against this 
failure to conform to intended behavior, and it has my name on it. 
The bug includes a reproducible test case, too. I filed it sometime 
in 1993 (I left Sun in late 1993) so I'm sure they've fixed it by 
now :-).  The Solaris bug was just a flat-out typo -- there was a 
line left out or transposed when some of the Sparc-specific code 
migrated from SunOS or Solaris (or else it inherited a bug that was 
later fixed in SunOS).

So, now you know.  If it isn't fixed in Solaris 2.5 (it's a real 
pain to write the test case, so I'm not doing it again just for fun), 
give them a hard time about letting bugs sit unfixed for over two 
years, especially since this one was well-described and included 
a test case from the very beginning.

speaking for myself,

David Chase