The rest of this document justifies the above design choices.
There are two primary problems in the implementation: how to deal with the write barrier tests and how to deal with the static area. These are not at all independent.
The current write barrier test on the Sparc (indeed, on any system where we can control relative addresses of separately allocated memory) has two parts. The first part is often compiled in-line and performs a simple test to see if the LHS is at a higher address than the heap limit. The if this test succeeds, a call to millicode is made to complete the write barrier. The secondary write barrier loads the generation bits of the two addresses and compares them, triggering a store to the SSB if the generation of the LHS is greater than the generation of the RHS.
It would be really nice if we could put the write barrier to use for dealing with assignments into the static area (assignment to pre-existing globals, mostly, but not exclusively). That way, the entire static area would not have to be scanned on every GC. However, for this to work, the current write barrier and GC infrastructure requires that the static area be allocated at higher addresses than any part of the dynamic area. Since the dynamic area is growable, it is not possible to accomplish such allocation unless we can map the static area to a (very) high address. We can probably do this with mmap(), but at this time I want to stay away from that if I can, to reduce OS dependencies. (This choice may be revisited.)
The upshot of this is that it is not practical to use the write barrier for dealing with the static area at this time, when code is compiled for the generational systems. Therefore, we disable the barrier during mutator runs and scan the static area on each GC. This is a nice solution because it allows code to be run that has been compiled without any write barrier checking enabled; we can then compare run times of code with and without the barrier, if desired.
The write barrier is "disabled" by making the static area have the same generation number as the dynamic area during mutator runs; even if the in-line write barrier makes a call to millicode, it will never trigger a store to the SSB. A more sophisticated solution would be to patch the millicode jump table so that a call to millicode simply returns.
It would also be possible to use the write barrier for the static area in the stop-and-copy system either by using mmap() as suggested above, or by generating different (slower) write barrier code when instructed to do so. No-one really cares about this because no-one is going to use the stop-and-copy collector for anything important.
The impact of the planned, faster barrier on this design has not been considered.
The stop-and-copy area is not of fixed size, so there is generally no way to guarantee that all its memory are at lower addresses than the static area, so the fast barrier test cannot be used. We could allocate the static area at lower addresses, but then we'd have to associate the remembered set with the s+c area, and that would really throw the collector for a loop (wouldn't work well).
Also, the static area constitutes an entirely different generation, so code that runs with the stop-and-copy system and a static area must in fact go through the write barrier! This makes assignments more expensive than they ought to be. A different static area would only have code and other non-pointer data in it.
The best way to deal with these issues at the present time seems to be the following:
The GC infrastructure must manage all of this.
The s+c area is allocated in chunks, so the stack can't just "start at the high end". In order to use the heap-top-is-stack-limit trick, we must have the stack in the same chunk that's currently being used for allocation, and when the chunk overflows, rather than collecting, we may have to move to a different chunk.