Skip to content

libia2: keep TCB TLS page shared across compartments#671

Open
oinoom wants to merge 4 commits intomainfrom
fix/shared-tls-tcb-rationale-docs
Open

libia2: keep TCB TLS page shared across compartments#671
oinoom wants to merge 4 commits intomainfrom
fix/shared-tls-tcb-rationale-docs

Conversation

@oinoom
Copy link
Copy Markdown
Contributor

@oinoom oinoom commented Mar 5, 2026

Problem

IA2 retags writable memory, including PT_TLS, to per-compartment pkeys. On x86_64, that model is too strict for TLS pages that hold process ABI state rather than compartment-private state.

In particular, the thread-pointer/TCB page (__builtin_thread_pointer(), %fs ABI state) can be accessed in normal execution and transition paths where current PKRU does not yet match the compartment that originally tagged the page. When that page is left compartment-tagged, this can fault in real workloads (including decode-path crashes where stack-protector canary reads use %fs).

Fix

protect_tls_pages() now uses an x86_64 split policy:

  • Default: tag module PT_TLS ranges with the target compartment pkey
  • Shared carve-out 1: ia2_stackptr_0 page (existing call-gate stack slot behavior)
  • Shared carve-out 2: thread-pointer/TCB page (page containing %fs ABI state)

Implementation details:

  • Maintain a bounded carve-out set (MAX_SHARED_TLS_PAGES = 2) and only add carve-outs that overlap the current module TLS range
  • Sort carve-out pages and protect non-shared regions using a cursor walk to avoid overlap and ordering errors
  • Restore shared carve-out pages to pkey 0 during process-init path (ia2_get_compartment() == 0)
  • Skip redundant post-init per-thread re-tagging to pkey 0; those pages are already shared and repeated retagging from compartment 1 can violate tracer policy
  • Keep the explicit startup hardening in ia2_start() that re-applies TCB sharing after compartment setup
  • Preserve pre-existing AArch64 behavior; TCB carve-out logic is x86_64-specific

Similar Patterns in IA2

  • Special-case shared TLS behavior already existed for ia2_stackptr_0 in protect_tls_pages() (see runtime/libia2/ia2.c, protect_tls_pages, around lines 466-473)
  • protect_pages() already uses a "protect-by-default with explicit shared carve-outs" model for writable ranges (see runtime/libia2/ia2.c, protect_pages, shared-range collection around lines 603-632, and overlap trimming before ia2_mprotect_with_tag around lines 685-723)
  • RELRO and explicit shared sections are already handled as policy exceptions to default tagging (see runtime/libia2/ia2.c, explicit shared_sections handling around lines 605-632 and PT_GNU_RELRO handling around lines 634-645)

@oinoom oinoom changed the base branch from fix/skip-partition-alloc-autowrap to fix/exit-cleanup-tls-alignment March 6, 2026 15:32
@oinoom oinoom changed the base branch from fix/exit-cleanup-tls-alignment to main March 19, 2026 02:14
@oinoom oinoom requested review from ayrtonm and fw-immunant and removed request for ayrtonm and fw-immunant March 19, 2026 02:14
@oinoom oinoom force-pushed the fix/shared-tls-tcb-rationale-docs branch 3 times, most recently from 708fd0b to da78029 Compare March 19, 2026 12:44
@oinoom oinoom marked this pull request as ready for review March 19, 2026 18:54
@oinoom oinoom requested review from ayrtonm and fw-immunant March 19, 2026 18:54

// x18-tagged aarch64 builds do not currently use the x86_64 TCB/%fs ABI path.
// Keep this symbol for cross-arch call-site symmetry.
void ia2_unprotect_thread_pointer_page(void) {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to remove this no-op since it could easily get missed later on and instead just add #if defined(__x86_64__) around this function's callsite.

struct ia2_thread_metadata *const thread_metadata = ia2_thread_metadata_get_for_current_thread();
#endif

enum { MAX_SHARED_TLS_PAGES = 2 };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't think an enum provides any benefit for a single value so maybe make this const int or whatever type to avoid the implicit backing type

David Anekstein added 4 commits March 31, 2026 13:01
This was encountered while running compartmentalized dav1d with IA2 libc
compartmenting enabled.

Observed behavior in dav1d:
- `dav1d --version` succeeds
- actual decode (`dav1d -i /tmp/test.ivf -o /dev/null`) crashes early
- the first crash mode was a SIGSEGV on compiler-generated stack-protector
  access via `%fs:0x28` while running under a non-shared compartment PKRU
  during cross-compartment execution

Root cause:
- IA2 retags PT_TLS pages per compartment
- on x86_64, the thread pointer / TCB page is `%fs` ABI state, not
  compartment-private state
- if this page is retagged as compartment-private, normal function
  prologue/epilogue stack-canary checks can fault before intended callgate logic

This change:
- adds ia2_unprotect_thread_pointer_page() and declares it in ia2_internal.h
- updates protect_tls_pages() to treat ABI-sensitive TLS pages as carve-outs
  that are explicitly tagged with pkey 0:
  - ia2_stackptr_0 page (existing shared callgate stack slot)
  - x86_64 TCB page (thread pointer / %fs page)
- keeps the rest of each PT_TLS range tagged with the owning compartment pkey
- invokes ia2_unprotect_thread_pointer_page() after startup compartment setup
  and after per-thread TLS setup for new threads

After this change, the dav1d decode crash moves forward to a different site
(`__tls_get_addr` reading `_rtld_local` in the loader), confirming this commit
addresses the `%fs`/TCB failure mode specifically.
@oinoom oinoom force-pushed the fix/shared-tls-tcb-rationale-docs branch from da78029 to ad0606f Compare March 31, 2026 17:03
@oinoom oinoom mentioned this pull request Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants