Paste number 137127: GENCGC ideas

Paste number 137127: GENCGC ideas
Pasted by: nyef
When:11 months, 4 days ago
Share:Tweet this! |
Paste contents:
Raw Source | XML | Display As
** SBCL GENCGC options

During an initial attempt at implementing markbits calculation for
GENCGC (as a preparatory step for a hybrid mark/sweep or mark/compact
collector), a number of "opportunities for excellence" were observed.

*** Less controversial options

**** Page table region_start_offset documentation is bogus

This field contains no justification.  Many references to it mention
that it is important, but none indicate why.  It turns out that the
main thing being tracked here is an object boundary (not necessarily
the nearest such) prior to the end of the page, but not after the
start of the page...  And this is required because we need to know
where to start when we're walking forward through the heap on a
per-object basis.

The offset is also from the object boundary to the start of the page,
not from the start of the page to the object boundary.

**** Minimize region_start_offset during scavenging of a page

If we keep track of the immediately previous object boundary while
scavenging, and we cross a page boundary, we can reset the
region_start_offset to its minimum possible value.

Reducing region_start_offset to zero means reducing the number of
pages pinned when there is a conservative root within the page (or
within the region of consecutive pages with a non-zero
region_start_offset after the page).  Reducing region_start_offset
means reducing the number of reads required when scavenging or when
searching through the heap for a pointer within the page (such as when
dealing with a page that possibly has references to a younger
generation, when checking for conservative roots, or when using
MAKE-LISP-OBJ).  Reducing region_start_offset by so much as a single
cache line width reduces cache load when performing the aforementioned

**** gencgc_partial_pickup is a poor name

This variable is really trying to say something about the presence or
absence of a page table in the core file being loaded.  Either finding
a better name for the variable or arranging for GENESIS to output a
page table is in order.

**** Separate allocation into boxed and unboxed regions from the start

Currently, the compiler allocates all objects into a single, boxed
allocation region, and the GC attempts to separate the objects out
into boxed and unboxed regions.  This separation can be thwarted by
object pinning.

Unboxed objects are any simple-vector not specialized to T, SAPs, and
anything numeric other than RATIOs and unspecialized COMPLEXes.  For
the most part, allocating these in a separate region is a matter of
either arranging for a flag in the DEFINE-PRIMITIVE-OBJECT form or
altering the relevant MOVE VOPs.  The only other noise would be to
arrange for there to be an unboxed region known to the compiler (and
on a per-thread basis).

*** More involved, but still relatively uncontroversial options

**** Page tables in core files should include bytes_used

This leads to more accurate memory usage statistics and tighter bounds
on scavenging memory and validating object pointers on saved core

**** GENESIS should output a page table.

At the very least, this would allow eliminating gencgc_partial_pickup,
and such a simple implementation should be fairly straightforward.

A somewhat more clever implementation would produce minimal
region_start_offsets for the core spaces.

**** GENESIS should separate out boxed, unboxed, and code pages

More involved than simply outputting a page table, this optimization
alone would give GENCGC a massive leg up on dealing with the contents
of a cold core before having done a full collection as part of saving
a warm core.

**** Boxed pinned pages should be write protected

The only reason that has come up in discussion as plausible for not
write-protecting pinned pages is that they might be involved in a
system call that will simply error out if passed an address to a
write-protected page.  This, in turn, is only really plausible for
unboxed data.  Once we split boxed data from unboxed data from initial
allocation there is no reason not to write-protect boxed pages, even
if they are pinned.

Being able to write-protect pinned, boxed pages that are part of an
older generation means that we can rely on the write barrier instead
of always having to scavenge the pages.

**** Allocation failures due to insufficient available pages should cause GC

When trying a large allocation, and it fails due to insufficient
available pages, it makes sense to try a minor, and then a major, GC
before giving up entirely, rather than simply panicing because there
aren't enough pages.

This should reduce the number of heap exhaustion errors due to
borderline-full heaps or poorly set GC triggers.

*** More controversial options

**** Track WP violations at a per-machine-page level, not per-gc-page

Similar logic to minimizing region_start_offset, as far as scavenging
goes, but reducing the scope of checking for "young" pointers yet
further.  We would have an equivalent to region_start_offset and a WP
violation bit per machine page as part of the GC page structure, and
this could be lazily initialized.  This also should reduce the
pressure towards smaller GC page sizes.

This could actually be handled on any granularity between machine
pages and full GC pages, which gives us what amounts to a tuning

**** Only large-object regions should span multiple GC pages

This should outright eliminate much of the code for scanning forward
and backward through the page table, and would have
region_start_offset at zero for all non-large-object pages.  It would
also reduce the average number of pages pinned due to a conservative
root (especially since large objects don't move anyway).

*** Options that require computing mark-bits

**** Replace dead objects in pinned pages with "free space" objects

These objects would be treated as unboxed data by the scavenger, and
thus would not hold further garbage objects as "live".  Note that this
means that a conservative root pins /one object/ and no longer keeps
the remaining objects on the page "live".

**** Simpler weak-pointer handling

If we already know that the object a weak pointer points to is live,
we can scavenge it when we scavenge the weak pointer itself, rather
than having to maintain a list of all of the weak pointers and fix
them up after having finished scavenging newspace.  The converse is
also the case, with a known dead object being pointed to.

Note that this does not help with weak hash tables or other weak
structures, only weak pointers.

**** Freelists within GC pages

A "free space" object can be ignored for the purposes of scavenging
pointers, and can easily be turned into an allocation region.  Or we
could maintain a ratio of allocated to free space within a page and
opt to "pin" it if there is sufficient allocated space or "evacuate"
it if there is insufficient allocated space, possibly reducing peak
GC page usage during collection (minor or major).

Either the conversion to allocation regions or the pin-if-too-full
options should reduce peak load, just via different mechanisms.  It
might be interesting to make them build-time (or run-time!) options,
or even dynamically swich between the options depending on the
presence of conservative roots.

**** Opportunistic release of evacuated page memory

If a page is being evacuated, and only contains a small number of
objects, we could use a "side vector" mapping the objects to their
forwarded location and release the page memory before the last inbound
reference has been updated.

If we want to get /really/ clever, it may be possible to re-use the GC
page for newspace, even before all of the old references have been
updated, further reducing peak address space usage during GC.  This is
a tricky elaboration, and probably fairly fragile in terms of

This paste has no annotations.

Colorize as:
Show Line Numbers

Lisppaste pastes can be made by anyone at any time. Imagine a fearsomely comprehensive disclaimer of liability. Now fear, comprehensively.