Skip to content

Scale initial heap slots#3

Open
eightbitraptor wants to merge 11 commits intomasterfrom
mvh-scale-heap-initial-slots
Open

Scale initial heap slots#3
eightbitraptor wants to merge 11 commits intomasterfrom
mvh-scale-heap-initial-slots

Conversation

@eightbitraptor
Copy link
Copy Markdown
Owner

I ran the lobste.rs benchmark from Ruby bench and snapshotted the heap. Predictable it looks like a bimodal distribution around the 40 and 160 byte size pools, so I've altered the heap initialisation code to scale initial pages by the same distribution pattern whilst keeping within the RSS usage that the original 10k slots per heap was using.

This eliminates 3 GC's on Interpreter startup on my machine (and exposed a bug in objectspace that relied on internal objects being GC'd before the test runs).

Integer weights table encoding the bimodal object population shape
observed in the lobsters benchmark. Two Gaussian modes: IMEMO peak
at pool 0 and class/hash peak at pool 2.
Converts total page budget and floor pages into per-pool slot counts
using the bimodal weights table. No behavioral change yet.
Clamp floor_total so misconfigured env vars (floor*HEAP_COUNT >
total_pages) degrade to floor_pages per pool rather than wrapping
to near SIZE_MAX. Add comment on intentional slot-count overcount.
Replace uniform GC_HEAP_INIT_SLOTS=10000 per pool with proportional
allocation from a bimodal page budget. Total RSS budget unchanged
at ~12 MiB (195 pages at 64 KiB). Pools 0 and 2 get the majority
of pages, matching observed IMEMO and class/hash populations.
The uniform default is replaced by gc_heap_compute_init_slots.
The static initializer now uses { 0 } since objspace_init
overwrites all entries via the bimodal distribution.
Users can scale the bimodal distribution up or down without changing
the shape. Per-pool RUBY_GC_HEAP_N_INIT_SLOTS still overrides
individual pools.
Three tests:
- Verify bimodal shape (pool 0 > pool 4, pool 2 > pool 4)
- Verify RUBY_GC_HEAP_INIT_TOTAL_PAGES scaling
- Verify per-pool env vars override bimodal defaults
The page count is a budget for lazy allocation, not a fixed RSS cost.
heap_prepare previously force-allocated pages outside the GC budget
when total_slots < init_slots. With bimodal init_slots giving pool 0
~139k slots, this meant up to 85 pages allocated invisibly to the GC,
breaking the invariant that free_slots + allocatable_slots predicts
when GC triggers.

Instead, seed objspace->heap_pages.allocatable_slots with the sum of
all init_slots at startup. Pages are now allocated through normal
budget accounting. The init_slots floor is still enforced by
gc_sweep_finish_heap and gc_marks_finish for shrinkage prevention.
ObjectSpace.each_object already skips hidden objects directly, but it could still
yield visible container objects that hold hidden internals. In that case,
calling methods like Hash#inspect can raise NotImplementedError for a hidden
T_ARRAY value.

Add a lightweight direct-reference check for Array and Hash entries and skip
containers that contain hidden/internal objects. This keeps hidden internals
from leaking through enumeration and fixes iteration patterns that call
inspect while traversing object space.

This was exposed because the changes to the heap init changes the number
of GC's that get run on ruby startup, leaving internal objects created
during interpreter boot still in the heap until the first GC is run.
eightbitraptor pushed a commit that referenced this pull request Apr 1, 2026
Move compilation steps from the heaviest jobs to the lightest to reduce
the critical path of the Compilations workflow.

Before: jobs ranged from 13-41 min (compile#12 had 4 steps, compile#3
had 10 clang versions).

After: jobs range from 7-9 steps each (excluding compile#1 which has
the LTO build), bringing the estimated critical path from ~41 min to
~30 min.

Moves:
- clang 23, 22, 21 from #3 to ruby#12 and ruby#10
- GCC 8, 7 from #2 to ruby#12
- `OPT_THREADED_CODE=1`, `OPT_THREADED_CODE=2` from ruby#7 to ruby#10
eightbitraptor pushed a commit that referenced this pull request Apr 2, 2026
pm_parse_process initializes the index_lookup_table but nothing seems to
use it after it has been allocated. However, pm_compile_scope_node will
overwrite the index_lookup_table and cause it to leak memory. This can
be seen during bootup with the following memory leaks reported by ASAN:

    #0 0x60dba31b7af3 in malloc
    #1 0x60dba32e0718 in rb_gc_impl_malloc gc/default/default.c:8287:5
    #2 0x60dba32c7aa7 in ruby_xmalloc_body gc.c:5373:12
    #3 0x60dba32c4a54 in ruby_xmalloc gc.c:5355:34
    ruby#4 0x60dba3260314 in pm_index_lookup_table_init_heap prism_compile.h:89:29
    ruby#5 0x60dba3209388 in pm_parse_process prism_compile.c:11366:5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant