Keep the receipts: interned index tracking for deterministic normalization#143
Keep the receipts: interned index tracking for deterministic normalization#143
Conversation
4f98f78 to
3a9432c
Compare
dkcumming
left a comment
There was a problem hiding this comment.
Wow super cool stuff! I had a brief look at #144 as well and I think this is a great idea.
I have one request, could this be an option that is disabled by default (and of course enabled for the golden files)? mir-semantics writes some smir.json files and stores them for some tests to test taking smir.json as input instead of a rust file (sometimes useful / necessary). It would be nice when we generate the test smir.json files to not have the receipts.json automatically produced as it won't be needed for that KMIR. What do you think?
Add a spy-based serialization pass that detects which JSON paths carry non-deterministic interned indices (Ty, Span, AllocId, etc.) and emits a companion *.smir.receipts.json alongside each *.smir.json output. The receipts declare three categories of interned indices: - interned_keys: object field names whose values are interned - interned_newtypes: enum variant wrappers around bare integers - interned_positions: known tuple positions carrying interned indices These receipts drive the normalise-filter.jq used for golden-file comparison, replacing the previous hardcoded normalization rules with a data-driven approach. See ADR-004 for the design rationale.
Good question @dkcumming! Let me think through this a bit before we land on an approach. So, the receipts file is basically a companion artifact that describes the structure of the JSON it sits beside. It's cheap to produce (the spy serializer adds negligible overhead), and because it's derived from the same But I think there's a bigger question lurking behind this one, and I'd rather surface it than paper over it with a flag: how does KMIR handle the fact that rustc's semantics (and the stable MIR schema) change between nightlies? The upstream UI test corpus changes too. If KMIR stores .smir.json files as frozen test inputs, those are snapshots of a specific nightly's output; a toolchain bump could change field names, add variants, reorder structures. The receipts file is one example of "extra output," but the shape of the JSON itself is the bigger moving target. I don't have enough visibility into how KMIR consumes these files to know whether that's already handled or whether it's a latent problem. My general instinct has been to emit more information rather than less, and expect downstream consumers to filter before ingesting. A receipts file that KMIR doesn't need is inert; it just sits there. But if the friction is something specific (CI noise from unexpected files? test assertions breaking on the extra file? disk space in a large test suite?), knowing that would help me figure out whether an opt-out flag, a What does the KMIR side of this actually look like? 🤔 |
12bc266 to
501f402
Compare
@dkcumming and I met to discuss the SMIR -> KMIR workflow. Sharing a summary here so there's a record for anyone following along. So, we went through the KMIR K configuration: the state machine that represents a MIR program during interpretation. The configuration has cells (XML-like containers) that get modified as the program steps:
The architectural detail that matters most for stable-mir-json: types and allocs are not in the K configuration. They're static data that doesn't change at runtime. Originally they were carried in the configuration, but that meant serializing them in every proof state (KMIR writes proof states to disk for interrupt/resume), which turned out to be a non-negligible performance overhead. The fix was to pull them out and provide them on demand via hooked functions (LLVM definitions in the backend). When a rule needs a type to process something on the K cell, it hooks into the static data store rather than reading from the configuration. (Heap memory is a different story: it will need to be in the configuration once that's implemented, because it's mutable at runtime; rules need to read and write it. It's just not there yet. Daniel's phrase for the current scope was "the McDonald's of Rust," which I'm going to be borrowing liberally.) We then looked at the mirroring relationship between the two codebases. KMIR's K syntax files (ty.md, alloc.md, body.md, mono.md) are intentionally named to correspond to stable-mir-json's Rust files (ty.rs, alloc.rs, body.rs, mono.rs). The K productions (EBNF-style syntax definitions) mirror the Rust structs field by The insight we landed on together: these K syntax definitions should be generated from the stable-mir-json types, not hand-maintained as a parallel mirror. When stable MIR changes between nightlies (which it does regularly), the K side needs to follow, and manual synchronization doesn't scale. Daniel acknowledged that some non-pure additions crept into the K-side definitions over time (things that aren't faithful reproductions of the stable MIR types but KMIR-specific semantic additions), and those would need cleanup before generation could work. The boundary between "faithful mirror of stable MIR" and "KMIR-specific semantic additions" needs to be explicit. This connects back to the original question about receipts: the more structured and introspectable stable-mir-json's output is, the easier it is for downstream consumers (KMIR included) to stay in sync as the upstream schema evolves. The receipts file is one small instance of that principle; a generative pipeline from Rust types to K syntax definitions would be a much larger one, but the same idea: derive, don't hand-maintain. Unsurprisingly, there's a symmetry here. Both pipelines have the same shape: stable-mir-json: [nightly rustc] -> [transform: collect/analyze] -> [smir.json] -> [filter: jq normalise] -> [golden files + UI tests]
KMIR: [smir.json] -> [transform: parse/convert] -> [K syntax] -> [filter: semantic rules] -> [proof states]And they hit the same class of problems at each stage:
My hypothesis is these aren't two separate problems. They're the same problem at two scales. Receipts solve it for the jq filter; a generative pipeline would solve it for K syntax definitions; and both are instances of the same principle: derive, don't hand-maintain. The sync obligation is the enemy; derivation is the fix; the only variable is how far downstream the derivation reaches. Next steps
|
Background
The normalise filter (
normalise-filter.jq) strips interned indices from golden files so that integration tests can compare outputs across platforms. The trouble is: the filter has to independently know the JSON schema. Every time Stable MIR adds a new field carrying an interned index (aTy,Span,AllocId, etc.), the filter needs a corresponding rule; we've been discovering these gaps exclusively through CI failures, and the pattern is pure whack-a-mole.This PR introduces a "receipts" system that closes the loop between producer and consumer:
Spy serializer (src/printer/receipts.rs): a no-output serde Serializer that mirrors the real serialization pass but only tracks which struct fields, newtype wrappers, and array positions contain known interned types. Because it hooks into serde's derive-generated code, it automatically detects new interned fields without manual filter updates.
Receipt emission (src/printer/mod.rs): emit_smir() now writes a companion
*.smir.receipts.jsonalongside each*.smir.json, declaring three categories of interned paths (keys,newtypes,positions).ADR-004 documenting the design rationale, receipt format, and known limitations.
The normalise filter itself isn't updated in this PR (that happens in the next PR, where the per-nightly golden file infrastructure lands and the filter switches from hardcoded rules to receipt-driven normalization).
N.B.: the receipts are generated dynamically by observing actual serialization calls, not from a static list. If upstream adds a new
Tyfield somewhere insideBody, the spy detects it automatically because serde's derive-generated code calls serialize_newtype_struct("Ty", ...) for the new field. A small set of "seeded" entries covers paths the spy can't discover dynamically (domain-specific newtype wrappers like TraitDef that are transparent in JSON but opaque to the spy's immediate-parent check).Test plan