|Paste number 137127:||GENCGC ideas|
|When:||3 years, 9 months ago|
|Share:||Tweet this! | http://paste.lisp.org/+2XT3|
** SBCL GENCGC options During an initial attempt at implementing markbits calculation for GENCGC (as a preparatory step for a hybrid mark/sweep or mark/compact collector), a number of "opportunities for excellence" were observed. *** Less controversial options **** Page table region_start_offset documentation is bogus This field contains no justification. Many references to it mention that it is important, but none indicate why. It turns out that the main thing being tracked here is an object boundary (not necessarily the nearest such) prior to the end of the page, but not after the start of the page... And this is required because we need to know where to start when we're walking forward through the heap on a per-object basis. The offset is also from the object boundary to the start of the page, not from the start of the page to the object boundary. **** Minimize region_start_offset during scavenging of a page If we keep track of the immediately previous object boundary while scavenging, and we cross a page boundary, we can reset the region_start_offset to its minimum possible value. Reducing region_start_offset to zero means reducing the number of pages pinned when there is a conservative root within the page (or within the region of consecutive pages with a non-zero region_start_offset after the page). Reducing region_start_offset means reducing the number of reads required when scavenging or when searching through the heap for a pointer within the page (such as when dealing with a page that possibly has references to a younger generation, when checking for conservative roots, or when using MAKE-LISP-OBJ). Reducing region_start_offset by so much as a single cache line width reduces cache load when performing the aforementioned operations. **** gencgc_partial_pickup is a poor name This variable is really trying to say something about the presence or absence of a page table in the core file being loaded. Either finding a better name for the variable or arranging for GENESIS to output a page table is in order. **** Separate allocation into boxed and unboxed regions from the start Currently, the compiler allocates all objects into a single, boxed allocation region, and the GC attempts to separate the objects out into boxed and unboxed regions. This separation can be thwarted by object pinning. Unboxed objects are any simple-vector not specialized to T, SAPs, and anything numeric other than RATIOs and unspecialized COMPLEXes. For the most part, allocating these in a separate region is a matter of either arranging for a flag in the DEFINE-PRIMITIVE-OBJECT form or altering the relevant MOVE VOPs. The only other noise would be to arrange for there to be an unboxed region known to the compiler (and on a per-thread basis). *** More involved, but still relatively uncontroversial options **** Page tables in core files should include bytes_used This leads to more accurate memory usage statistics and tighter bounds on scavenging memory and validating object pointers on saved core pages. **** GENESIS should output a page table. At the very least, this would allow eliminating gencgc_partial_pickup, and such a simple implementation should be fairly straightforward. A somewhat more clever implementation would produce minimal region_start_offsets for the core spaces. **** GENESIS should separate out boxed, unboxed, and code pages More involved than simply outputting a page table, this optimization alone would give GENCGC a massive leg up on dealing with the contents of a cold core before having done a full collection as part of saving a warm core. **** Boxed pinned pages should be write protected The only reason that has come up in discussion as plausible for not write-protecting pinned pages is that they might be involved in a system call that will simply error out if passed an address to a write-protected page. This, in turn, is only really plausible for unboxed data. Once we split boxed data from unboxed data from initial allocation there is no reason not to write-protect boxed pages, even if they are pinned. Being able to write-protect pinned, boxed pages that are part of an older generation means that we can rely on the write barrier instead of always having to scavenge the pages. **** Allocation failures due to insufficient available pages should cause GC When trying a large allocation, and it fails due to insufficient available pages, it makes sense to try a minor, and then a major, GC before giving up entirely, rather than simply panicing because there aren't enough pages. This should reduce the number of heap exhaustion errors due to borderline-full heaps or poorly set GC triggers. *** More controversial options **** Track WP violations at a per-machine-page level, not per-gc-page Similar logic to minimizing region_start_offset, as far as scavenging goes, but reducing the scope of checking for "young" pointers yet further. We would have an equivalent to region_start_offset and a WP violation bit per machine page as part of the GC page structure, and this could be lazily initialized. This also should reduce the pressure towards smaller GC page sizes. This could actually be handled on any granularity between machine pages and full GC pages, which gives us what amounts to a tuning parameter. **** Only large-object regions should span multiple GC pages This should outright eliminate much of the code for scanning forward and backward through the page table, and would have region_start_offset at zero for all non-large-object pages. It would also reduce the average number of pages pinned due to a conservative root (especially since large objects don't move anyway). *** Options that require computing mark-bits **** Replace dead objects in pinned pages with "free space" objects These objects would be treated as unboxed data by the scavenger, and thus would not hold further garbage objects as "live". Note that this means that a conservative root pins /one object/ and no longer keeps the remaining objects on the page "live". **** Simpler weak-pointer handling If we already know that the object a weak pointer points to is live, we can scavenge it when we scavenge the weak pointer itself, rather than having to maintain a list of all of the weak pointers and fix them up after having finished scavenging newspace. The converse is also the case, with a known dead object being pointed to. Note that this does not help with weak hash tables or other weak structures, only weak pointers. **** Freelists within GC pages A "free space" object can be ignored for the purposes of scavenging pointers, and can easily be turned into an allocation region. Or we could maintain a ratio of allocated to free space within a page and opt to "pin" it if there is sufficient allocated space or "evacuate" it if there is insufficient allocated space, possibly reducing peak GC page usage during collection (minor or major). Either the conversion to allocation regions or the pin-if-too-full options should reduce peak load, just via different mechanisms. It might be interesting to make them build-time (or run-time!) options, or even dynamically swich between the options depending on the presence of conservative roots. **** Opportunistic release of evacuated page memory If a page is being evacuated, and only contains a small number of objects, we could use a "side vector" mapping the objects to their forwarded location and release the page memory before the last inbound reference has been updated. If we want to get /really/ clever, it may be possible to re-use the GC page for newspace, even before all of the old references have been updated, further reducing peak address space usage during GC. This is a tricky elaboration, and probably fairly fragile in terms of implementation.
This paste has no annotations.