Skip to content

Keep the receipts: interned index tracking for deterministic normalization#143

Open
cds-amal wants to merge 1 commit intomasterfrom
stack/pr2-receipts
Open

Keep the receipts: interned index tracking for deterministic normalization#143
cds-amal wants to merge 1 commit intomasterfrom
stack/pr2-receipts

Conversation

@cds-amal
Copy link
Copy Markdown
Collaborator

Background

The normalise filter (normalise-filter.jq) strips interned indices from golden files so that integration tests can compare outputs across platforms. The trouble is: the filter has to independently know the JSON schema. Every time Stable MIR adds a new field carrying an interned index (a Ty, Span, AllocId, etc.), the filter needs a corresponding rule; we've been discovering these gaps exclusively through CI failures, and the pattern is pure whack-a-mole.

This PR introduces a "receipts" system that closes the loop between producer and consumer:

  • Spy serializer (src/printer/receipts.rs): a no-output serde Serializer that mirrors the real serialization pass but only tracks which struct fields, newtype wrappers, and array positions contain known interned types. Because it hooks into serde's derive-generated code, it automatically detects new interned fields without manual filter updates.

  • Receipt emission (src/printer/mod.rs): emit_smir() now writes a companion *.smir.receipts.json alongside each *.smir.json, declaring three categories of interned paths (keys, newtypes, positions).

  • ADR-004 documenting the design rationale, receipt format, and known limitations.

The normalise filter itself isn't updated in this PR (that happens in the next PR, where the per-nightly golden file infrastructure lands and the filter switches from hardcoded rules to receipt-driven normalization).

N.B.: the receipts are generated dynamically by observing actual serialization calls, not from a static list. If upstream adds a new Ty field somewhere inside Body, the spy detects it automatically because serde's derive-generated code calls serialize_newtype_struct("Ty", ...) for the new field. A small set of "seeded" entries covers paths the spy can't discover dynamically (domain-specific newtype wrappers like TraitDef that are transparent in JSON but opaque to the spy's immediate-parent check).

Test plan

  • cargo build passes
  • cargo clippy -- -Dwarnings passes (the layout_panics reporting fix is included)
  • make integration-test passes (receipts are emitted but the filter doesn't read them yet; flat golden files still work)

@cds-amal cds-amal force-pushed the stack/pr2-receipts branch from 4f98f78 to 3a9432c Compare March 11, 2026 19:38
Base automatically changed from stack/pr1-foundation to master March 22, 2026 03:41
@dkcumming dkcumming requested a review from a team March 22, 2026 03:41
Copy link
Copy Markdown
Collaborator

@dkcumming dkcumming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow super cool stuff! I had a brief look at #144 as well and I think this is a great idea.

I have one request, could this be an option that is disabled by default (and of course enabled for the golden files)? mir-semantics writes some smir.json files and stores them for some tests to test taking smir.json as input instead of a rust file (sometimes useful / necessary). It would be nice when we generate the test smir.json files to not have the receipts.json automatically produced as it won't be needed for that KMIR. What do you think?

Add a spy-based serialization pass that detects which JSON paths carry
non-deterministic interned indices (Ty, Span, AllocId, etc.) and emits
a companion *.smir.receipts.json alongside each *.smir.json output.

The receipts declare three categories of interned indices:
  - interned_keys: object field names whose values are interned
  - interned_newtypes: enum variant wrappers around bare integers
  - interned_positions: known tuple positions carrying interned indices

These receipts drive the normalise-filter.jq used for golden-file
comparison, replacing the previous hardcoded normalization rules with
a data-driven approach. See ADR-004 for the design rationale.
@cds-amal
Copy link
Copy Markdown
Collaborator Author

cds-amal commented Mar 22, 2026

Wow super cool stuff! I had a brief look at #144 as well and I think this is a great idea.

I have one request, could this be an option that is disabled by default (and of course enabled for the golden files)? mir-semantics writes some smir.json files and stores them for some tests to test taking smir.json as input instead of a rust file (sometimes useful / necessary). It would be nice when we generate the test smir.json files to not have the receipts.json automatically produced as it won't be needed for that KMIR. What do you think?

Good question @dkcumming! Let me think through this a bit before we land on an approach.

So, the receipts file is basically a companion artifact that describes the structure of the JSON it sits beside. It's cheap to produce (the spy serializer adds negligible overhead), and because it's derived from the same #[derive(Serialize)] annotations that produce the JSON, the two can't drift apart by construction. The cost of always emitting it is pretty minimal, and having it there means any consumer gets a machine-readable manifest of which fields carry non-deterministic values, for free.

But I think there's a bigger question lurking behind this one, and I'd rather surface it than paper over it with a flag: how does KMIR handle the fact that rustc's semantics (and the stable MIR schema) change between nightlies? The upstream UI test corpus changes too. If KMIR stores .smir.json files as frozen test inputs, those are snapshots of a specific nightly's output; a toolchain bump could change field names, add variants, reorder structures. The receipts file is one example of "extra output," but the shape of the JSON itself is the bigger moving target. I don't have enough visibility into how KMIR consumes these files to know whether that's already handled or whether it's a latent problem.

My general instinct has been to emit more information rather than less, and expect downstream consumers to filter before ingesting. A receipts file that KMIR doesn't need is inert; it just sits there. But if the friction is something specific (CI noise from unexpected files? test assertions breaking on the extra file? disk space in a large test suite?), knowing that would help me figure out whether an opt-out flag, a .gitignore pattern, or something else entirely is the right fix.

What does the KMIR side of this actually look like? 🤔

@cds-amal cds-amal force-pushed the stack/pr2-receipts branch from 12bc266 to 501f402 Compare March 22, 2026 05:37
@cds-amal cds-amal requested a review from dkcumming March 22, 2026 10:07
@cds-amal
Copy link
Copy Markdown
Collaborator Author

cds-amal commented Mar 22, 2026

..schnipp, schnapp..
What does the KMIR side of this actually look like? 🤔

@dkcumming and I met to discuss the SMIR -> KMIR workflow. Sharing a summary here so there's a record for anyone following along.

So, we went through the KMIR K configuration: the state machine that represents a MIR program during interpretation. The configuration has cells (XML-like containers) that get modified as the program steps:

  • The K cell (current continuation): the remaining program steps until termination. This shrinks as rules fire, but can also grow; a call terminator, for instance, adds setup steps for the callee's locals before the callee's body lands on the continuation.
  • Return value: starts as noReturn, only gets set at program end.
  • Current function: a Ty index tracking which function we're executing (with -1 reserved for main, pragmatically if not soundly).
  • Current frame: the body (list of basic blocks), the context of where this frame came from, the destination for the return value, the next basic block, and the locals (local 0 is the return local; 1+ are arguments, then temporaries).
  • Stack: frames get pushed and popped for nested calls, as you'd expect.

The architectural detail that matters most for stable-mir-json: types and allocs are not in the K configuration. They're static data that doesn't change at runtime. Originally they were carried in the configuration, but that meant serializing them in every proof state (KMIR writes proof states to disk for interrupt/resume), which turned out to be a non-negligible performance overhead. The fix was to pull them out and provide them on demand via hooked functions (LLVM definitions in the backend). When a rule needs a type to process something on the K cell, it hooks into the static data store rather than reading from the configuration.

(Heap memory is a different story: it will need to be in the configuration once that's implemented, because it's mutable at runtime; rules need to read and write it. It's just not there yet. Daniel's phrase for the current scope was "the McDonald's of Rust," which I'm going to be borrowing liberally.)

We then looked at the mirroring relationship between the two codebases. KMIR's K syntax files (ty.md, alloc.md, body.md, mono.md) are intentionally named to correspond to stable-mir-json's Rust files (ty.rs, alloc.rs, body.rs, mono.rs). The K productions (EBNF-style syntax definitions) mirror the Rust structs field by
field. Daniel showed the Allocation type side by side: bytes, mutability, provenance, align on both sides, same shape, different notation.

The insight we landed on together: these K syntax definitions should be generated from the stable-mir-json types, not hand-maintained as a parallel mirror. When stable MIR changes between nightlies (which it does regularly), the K side needs to follow, and manual synchronization doesn't scale. Daniel acknowledged that some non-pure additions crept into the K-side definitions over time (things that aren't faithful reproductions of the stable MIR types but KMIR-specific semantic additions), and those would need cleanup before generation could work. The boundary between "faithful mirror of stable MIR" and "KMIR-specific semantic additions" needs to be explicit.

This connects back to the original question about receipts: the more structured and introspectable stable-mir-json's output is, the easier it is for downstream consumers (KMIR included) to stay in sync as the upstream schema evolves. The receipts file is one small instance of that principle; a generative pipeline from Rust types to K syntax definitions would be a much larger one, but the same idea: derive, don't hand-maintain.

Unsurprisingly, there's a symmetry here. Both pipelines have the same shape:

stable-mir-json:  [nightly rustc] -> [transform: collect/analyze] -> [smir.json] -> [filter: jq normalise] -> [golden files + UI tests]
KMIR:             [smir.json]     -> [transform: parse/convert]   -> [K syntax]  -> [filter: semantic rules] -> [proof states]

And they hit the same class of problems at each stage:

  1. Upstream instability enters at the left edge. For stable-mir-json, a nightly bump changes rustc's internal APIs, enum variants, and type shapes; it also changes the upstream UI test corpus (Rust snippets maintained by the rustc project that we run stable-mir-json against). For KMIR, a stable-mir-json schema change (new fields, renamed variants, removed types) does the same thing. Both pipelines have inputs they don't control that change on a cadence they can't predict.

  2. The transform stage absorbs structural change. In stable-mir-json, the compat layer isolates rustc API churn so it doesn't propagate into the printer. In KMIR, the parsing/conversion layer (the Python code that reads smir.json and populates K cells) absorbs schema changes so they don't propagate into the semantic rules. Same pattern, same purpose: an adapter boundary that contains blast radius.

  3. The output needs to be deterministic despite non-deterministic input. For stable-mir-json, interned indices are non-deterministic across runs, so we sort by content and strip IDs with the jq filter. For KMIR, proof states need to be reproducible across runs for interrupt/resume, so static data gets pulled out of the configuration. Both pipelines fight the same enemy (non-determinism from the compilation environment) and solve it the same way (separate the stable content from the unstable identity).

  4. The filter stage strips what downstream doesn't need. The jq filter strips interned indices for golden-file comparison. The K semantic rules selectively pull types and allocs via hooks rather than carrying everything. Both are "emit more, consume selectively" rather than "emit only what's needed."

  5. Schema knowledge is the coupling point. The jq filter needs to know which fields carry interned indices (that's what the receipts solve). The K syntax definitions need to know the structure of every type in smir.json (that's what generation would solve). Both are cases where a downstream consumer needs structural knowledge of the upstream schema, and the question is whether that knowledge is hand-maintained or derived.

My hypothesis is these aren't two separate problems. They're the same problem at two scales. Receipts solve it for the jq filter; a generative pipeline would solve it for K syntax definitions; and both are instances of the same principle: derive, don't hand-maintain. The sync obligation is the enemy; derivation is the fix; the only variable is how far downstream the derivation reaches.

Next steps

  • @dkcumming and I meet to test the generation hypothesis: pick one K syntax file (say, alloc.md), look at what's a pure mirror of the Rust side vs. what's a KMIR-specific addition, and see whether the pure-mirror portion could be generated from stable-mir-json's type definitions. If it works for one file, it works for all of them; if it doesn't, we'll learn exactly where the abstraction breaks.
  • Smoke-test the stacked PRs against KMIR: check out the final branch of the stack (which carries the most recent supported nightly, nightly-2026-01-15) and run it against KMIR. I expect this to fail out of the box; KMIR is pinned to an older nightly's schema, and the stack introduces structural changes that the K side hasn't absorbed yet. But here's the thing: if we pin rust-toolchain.toml back to master's current version (the nightly KMIR was last validated against), the proofs should just work. That's a quick sanity check worth running: it tells us "the stacked PRs don't break the downstream consumer when the schema version matches," which is the invariant we actually care about. The nightly-bump failures are expected; a regression at the current nightly would not be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants