Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions notes/2026-03-23.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# How allocation changes affect GC policies

Note author: shruti2522
date: 2026-03-23

This note answers the question from issue #58 about how reconciling the allocation changes with different GC policies works.

## The problem we fixed

Before, every object had two headers. The allocator added a header to track if the object was dropped and the garbage collector added its own header. This wasted memory and tied the allocator to the GC.

By making the allocator wrapper transparent, the allocator no longer adds a header. The only header is the GC header. This means the allocator just handles raw memory and the GC handles the rules.

## How this helps different GC policies

This separation is great because it means we can change the GC rules without breaking the allocator.

### Arena allocator and mark sweep

For our bump allocator (arena2) and mark sweep, we changed the drop tracking from a linked list to simple counters for the whole arena. This makes checking if an arena is empty very fast. But we can only track whole arenas, not single objects.

### Mempool allocator and mark sweep

For our size class pool (mempool3), it was already using no headers. It uses a bitmap to track slots. This lets us free single objects, which is good for incremental garbage collection.

### Future GC policies

Here is how this setup helps future ideas:

1. **Generational GC:** We can recycle whole arenas for young objects when they die. Since there are no allocator headers, moving surviving objects to older generations is easy.
2. **Compacting or Copying GC:** Before this change, every object stored a raw pointer to the next object in the arena inside its `TaggedPtr` header. If you moved objects during compaction, all of those embedded pointers would become dangling. Now there are no such pointers inside allocations, so objects can be freely copied or moved in memory.
3. **Concurrent Mark:** We just need to change our counters or bitmaps to use thread safe atomic types. The transparent objects themselves are already safe.
4. **Reference Counting:** The reference count can just live in the garbage collector header. The allocator does not need to know about it.

## Conclusion

By removing allocator headers, the garbage collector is fully separated from the allocator. We can now easily swap them out or build new rules like compacting or generational collection.
6 changes: 3 additions & 3 deletions oscars/benches/arena2_vs_mempool3.rs
Original file line number Diff line number Diff line change
Expand Up @@ -382,10 +382,10 @@ fn bench_dealloc_speed(c: &mut Criterion) {
},
|(mut allocator, ptrs): (_, _)| {
for ptr in ptrs {
let mut heap_item_ptr = ptr.as_ptr();
let heap_item_ptr = ptr.as_ptr();
unsafe {
core::ptr::drop_in_place(heap_item_ptr.as_mut().as_ptr());
heap_item_ptr.as_mut().mark_dropped();
core::ptr::drop_in_place(heap_item_ptr.cast::<usize>().as_ptr());
allocator.mark_dropped(heap_item_ptr.as_ptr() as *const u8);
}
}
allocator.drop_dead_arenas();
Expand Down
166 changes: 44 additions & 122 deletions oscars/src/alloc/arena2/alloc.rs
Original file line number Diff line number Diff line change
@@ -1,124 +1,59 @@
use core::{
cell::Cell,
marker::PhantomData,
ptr::{NonNull, drop_in_place},
};
use core::{cell::Cell, marker::PhantomData, ptr::NonNull};

use rust_alloc::alloc::{Layout, alloc, dealloc, handle_alloc_error};

use crate::alloc::arena2::ArenaAllocError;

/// Transparent wrapper for a GC value.
/// Drop state is tracked by the GC header and arena counters.
#[derive(Debug)]
#[repr(C)]
pub struct ArenaHeapItem<T: ?Sized> {
next: TaggedPtr<ErasedHeapItem>,
value: T,
}
#[repr(transparent)]
pub struct ArenaHeapItem<T: ?Sized>(pub T);

impl<T: ?Sized> ArenaHeapItem<T> {
fn new(next: *mut ErasedHeapItem, value: T) -> Self
fn new(value: T) -> Self
where
T: Sized,
{
Self {
next: TaggedPtr(next),
value,
}
}

pub fn mark_dropped(&mut self) {
if !self.next.is_tagged() {
self.next.tag()
}
}

pub fn is_dropped(&self) -> bool {
self.next.is_tagged()
Self(value)
}

pub fn value(&self) -> &T {
&self.value
&self.0
}

pub fn as_ptr(&mut self) -> *mut T {
&mut self.value as *mut T
&mut self.0 as *mut T
}

/// Returns a raw mutable pointer to the value
///
/// This avoids creating a `&mut self` reference, which can lead to stacked borrows
/// if shared references to the heap item exist
pub(crate) fn as_value_ptr(ptr: NonNull<Self>) -> *mut T {
// SAFETY: `&raw mut` computes the field address without creating a reference
unsafe { &raw mut (*ptr.as_ptr()).value }
}

fn value_mut(&mut self) -> &mut T {
&mut self.value
}
}

impl<T: ?Sized> Drop for ArenaHeapItem<T> {
fn drop(&mut self) {
unsafe {
if !self.is_dropped() {
self.mark_dropped();
drop_in_place(self.value_mut())
}
}
// With repr(transparent), the outer struct has the same address as the inner value
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just remove this method. No point preserving two APIs that are returning *mut T

ptr.as_ptr() as *mut T
}
}

/// Type erased pointer for arena allocations.
#[derive(Debug, Clone, Copy)]
#[repr(C)]
pub struct ErasedHeapItem {
next: TaggedPtr<usize>,
buf: NonNull<u8>, // Start of a byte buffer
}
#[repr(transparent)]
pub struct ErasedHeapItem(NonNull<u8>);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: preserve comment


impl ErasedHeapItem {
pub fn get<T>(&self) -> NonNull<T> {
self.buf.cast::<T>()
}

pub fn mark_dropped(&mut self) {
if !self.next.is_tagged() {
self.next.tag()
}
}

pub fn is_dropped(&self) -> bool {
self.next.is_tagged()
self.0.cast::<T>()
}
}

impl<T> core::convert::AsRef<T> for ErasedHeapItem {
fn as_ref(&self) -> &T {
// SAFETY: TODO
// SAFETY: caller ensures this pointer was allocated as T
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"caller must ensure"

unsafe { self.get().as_ref() }
}
}

const MASK: usize = 1usize << (usize::BITS as usize - 1usize);

#[derive(Debug, Clone, Copy)]
#[repr(transparent)]
pub struct TaggedPtr<T>(*mut T);

impl<T> TaggedPtr<T> {
fn tag(&mut self) {
self.0 = self.0.map_addr(|addr| addr | MASK);
}

fn is_tagged(&self) -> bool {
self.0 as usize & MASK == MASK
}

fn as_ptr(&self) -> *mut T {
self.0.map_addr(|addr| addr & !MASK)
}
}

// An arena pointer
//
// NOTE: This will actually need to be an offset at some point if we were to add
Expand All @@ -127,18 +62,19 @@ impl<T> TaggedPtr<T> {

#[derive(Debug, Clone, Copy)]
#[repr(transparent)]
pub struct ErasedArenaPointer<'arena>(NonNull<ErasedHeapItem>, PhantomData<&'arena ()>);
pub struct ErasedArenaPointer<'arena>(NonNull<u8>, PhantomData<&'arena ()>);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: why not use ErasedHeapItem here?

I think I'd prefer to preserve the type for it's explicitness, but I'm open to this if you have a good argument for NonNull<u8>


impl<'arena> ErasedArenaPointer<'arena> {
fn from_raw(raw: NonNull<ErasedHeapItem>) -> Self {
fn from_raw(raw: NonNull<u8>) -> Self {
Self(raw, PhantomData)
}

pub fn as_non_null(&self) -> NonNull<ErasedHeapItem> {
self.0
// Keep the old erased pointer API
ErasedHeapItem(self.0).get()
}

pub fn as_raw_ptr(&self) -> *mut ErasedHeapItem {
pub fn as_raw_ptr(&self) -> *mut u8 {
self.0.as_ptr()
}

Expand Down Expand Up @@ -168,17 +104,14 @@ pub struct ArenaPointer<'arena, T>(ErasedArenaPointer<'arena>, PhantomData<&'are

impl<'arena, T> ArenaPointer<'arena, T> {
unsafe fn from_raw(raw: NonNull<ArenaHeapItem<T>>) -> Self {
Self(
ErasedArenaPointer::from_raw(raw.cast::<ErasedHeapItem>()),
PhantomData,
)
Self(ErasedArenaPointer::from_raw(raw.cast::<u8>()), PhantomData)
}

pub fn as_inner_ref(&self) -> &'arena T {
// SAFETY: HeapItem is non-null and valid for dereferencing.
// SAFETY: pointer is valid, ArenaHeapItem<T> is repr(transparent) over T.
unsafe {
let typed_ptr = self.0.as_raw_ptr().cast::<ArenaHeapItem<T>>();
&(*typed_ptr).value
&(*typed_ptr).0
}
}

Expand All @@ -189,7 +122,7 @@ impl<'arena, T> ArenaPointer<'arena, T> {
/// - Caller must ensure that T is not dropped
/// - Caller must ensure that the lifetime of T does not exceed it's Arena.
pub fn as_ptr(&self) -> NonNull<ArenaHeapItem<T>> {
self.0.as_non_null().cast::<ArenaHeapItem<T>>()
self.0.0.cast::<ArenaHeapItem<T>>()
}

/// Convert the current ArenaPointer into an `ErasedArenaPointer`
Expand Down Expand Up @@ -242,7 +175,10 @@ pub struct ArenaAllocationData {
pub struct Arena<'arena> {
pub flags: Cell<ArenaState>,
pub layout: Layout,
pub last_allocation: Cell<*mut ErasedHeapItem>,
/// Number of allocations made in this arena
alloc_count: Cell<usize>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I'm not sure this is the correct approach.

I think this approach does work, but I believe it would open us up to a pretty large issue where an already dropped allocation could be provided and then the drop count is immediately incorrect.

The allocation header / footer needs to be able to track its liveliness

/// Number of items marked as dropped
drop_count: Cell<usize>,
pub current_offset: Cell<usize>,
pub buffer: NonNull<u8>,
_marker: PhantomData<&'arena ()>,
Expand All @@ -266,7 +202,8 @@ impl<'arena> Arena<'arena> {
Ok(Self {
flags: Cell::new(ArenaState::default()),
layout,
last_allocation: Cell::new(core::ptr::null_mut::<ErasedHeapItem>()), // NOTE: watch this one.
alloc_count: Cell::new(0),
drop_count: Cell::new(0),
current_offset: Cell::new(0),
buffer: data,
_marker: PhantomData,
Expand All @@ -277,6 +214,11 @@ impl<'arena> Arena<'arena> {
self.flags.set(self.flags.get().full());
}

/// Increment the drop counter.
pub fn mark_dropped(&self) {
self.drop_count.set(self.drop_count.get() + 1);
}

pub fn alloc<T>(&self, value: T) -> ArenaPointer<'arena, T> {
self.try_alloc(value).unwrap()
}
Expand Down Expand Up @@ -328,11 +270,11 @@ impl<'arena> Arena<'arena> {
let dst = buffer_ptr
.add(allocation_data.buffer_offset)
.cast::<ArenaHeapItem<T>>();
// NOTE: everyI recomm next begin by pointing back to the start of the buffer rather than null.
let arena_heap_item = ArenaHeapItem::new(self.last_allocation.get(), value);
// Write the value
let arena_heap_item = ArenaHeapItem::new(value);
dst.write(arena_heap_item);
// We've written the last_allocation to the heap, so update with a pointer to dst
self.last_allocation.set(dst as *mut ErasedHeapItem);
// Track live/drop state with counters.
self.alloc_count.set(self.alloc_count.get() + 1);
ArenaPointer::from_raw(NonNull::new_unchecked(dst))
}
}
Expand Down Expand Up @@ -372,30 +314,9 @@ impl<'arena> Arena<'arena> {
})
}

/// Walks the Arena allocations to determine if the arena is droppable
/// Returns true when all allocations were marked dropped.
pub fn run_drop_check(&self) -> bool {
let mut unchecked_ptr = self.last_allocation.get();
while let Some(node) = NonNull::new(unchecked_ptr) {
let item = unsafe { node.as_ref() };
if !item.is_dropped() {
return false;
}
unchecked_ptr = item.next.as_ptr() as *mut ErasedHeapItem
}
true
}

// checks dropped items in this arena
#[cfg(test)]
pub fn item_drop_states(&self) -> rust_alloc::vec::Vec<bool> {
let mut result = rust_alloc::vec::Vec::new();
let mut unchecked_ptr = self.last_allocation.get();
while let Some(node) = NonNull::new(unchecked_ptr) {
let item = unsafe { node.as_ref() };
result.push(item.is_dropped());
unchecked_ptr = item.next.as_ptr() as *mut ErasedHeapItem
}
result
self.alloc_count.get() == self.drop_count.get()
}

/// Reset arena to its initial empty state, reusing the existing OS buffer.
Expand All @@ -410,7 +331,8 @@ impl<'arena> Arena<'arena> {
// the same layout in try_init.
unsafe { core::ptr::write_bytes(self.buffer.as_ptr(), 0, self.layout.size()) };
self.flags.set(ArenaState::default());
self.last_allocation.set(core::ptr::null_mut());
self.alloc_count.set(0);
self.drop_count.set(0);
self.current_offset.set(0);
}
}
Expand Down
36 changes: 29 additions & 7 deletions oscars/src/alloc/arena2/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -193,12 +193,34 @@ impl<'alloc> ArenaAllocator<'alloc> {
}
}

// checks dropped items across all arenas
#[cfg(test)]
pub fn arena_drop_states(&self) -> rust_alloc::vec::Vec<rust_alloc::vec::Vec<bool>> {
self.arenas
.iter()
.map(|arena| arena.item_drop_states())
.collect()
/// Mark `ptr` dropped in its arena.
///
/// # Safety
/// `ptr` must belong to this allocator and be marked once.
pub unsafe fn mark_dropped(&mut self, ptr: *const u8) {
let ptr_addr = ptr as usize;
for arena in &self.arenas {
let start = arena.buffer.as_ptr() as usize;
let end = start + arena.layout.size();
if ptr_addr >= start && ptr_addr < end {
arena.mark_dropped();
return;
}
}
// Recycled arenas should not match, but check anyway
for arena in self.recycled_arenas.iter().flatten() {
let start = arena.buffer.as_ptr() as usize;
let end = start + arena.layout.size();
if ptr_addr >= start && ptr_addr < end {
arena.mark_dropped();
return;
}
}
// Pointer not from this allocator, likely double free or foreign allocator.
debug_assert!(
false,
"mark_dropped: pointer {ptr_addr:#x} not owned by any arena; \
possible double-free or pointer from a foreign allocator"
);
}
}
Loading