[gclist] memory protections and system calls
David Chase
chase@centerline.com
Mon, 01 Jul 96 10:44:28 -0400
> From: wilson@cs.utexas.edu (Paul R. Wilson)
> OOPS, I was running two problems together. The thing that crashes
> SunOS is actually register window flushes (not explicit system calls)
> into protected pages of the stack segment. The register window overflow
> code can't cope with protected pages.
> Apparently this can be fixed, but Sun hasn't done it.
That's not entirely true. In order to deal with the case that you
describe, you must install an alternate-stack signal handler for
the appropriate signals (SIGSEGV) and check the faulting address
to see if it lies within a stack. If the handler then maps in the
faulting page, and returns from interrupt, then the OS is supposed
to retry the instruction after spilling register windows into the
newly mapped stack. This is what is *supposed* to happen; I have
seen the code with my own eyes, and you can see artifacts of it in
the mumble_context structures passed in to signal handlers. It is
even supposed to work if you map some of the pages needed to spill
register windows, but not all -- it refaults, you map another page,
and it tries again when you return again, repeat until all windows
can spill.
In fact, this code did not work under Solaris 2.1, 2.2, or 2.3beta;
I do not know about later versions. It also superficially appears
to work under most recent versions of SunOS (up through 4.1.3_U1beta),
but in fact something goes subtlely wrong, and eventually it will
fail, except in the case of a SuperSparc-based machine running an
up-to-date (properly patched) version of 4.1.3 or 4.1.3_U1 (other
later versions of the OS may work on more platforms -- I have not
tried it there).
As if you could not tell by now, there is a bug filed against this
failure to conform to intended behavior, and it has my name on it.
The bug includes a reproducible test case, too. I filed it sometime
in 1993 (I left Sun in late 1993) so I'm sure they've fixed it by
now :-). The Solaris bug was just a flat-out typo -- there was a
line left out or transposed when some of the Sparc-specific code
migrated from SunOS or Solaris (or else it inherited a bug that was
later fixed in SunOS).
So, now you know. If it isn't fixed in Solaris 2.5 (it's a real
pain to write the test case, so I'm not doing it again just for fun),
give them a hard time about letting bugs sit unfixed for over two
years, especially since this one was well-described and included
a test case from the very beginning.
speaking for myself,
David Chase