Conversation
6603f15 to
3cfa544
Compare
There was a problem hiding this comment.
Pull request overview
Adds Azure Policy–oriented VM infrastructure and metadata/context plumbing to the RVM, alongside new instructions and updated program serialization (v6) with expanded metadata.
Changes:
- Introduces new VM instructions (
LoadContext,LoadMetadata,ReturnUndefinedIfNotTrue,CoalesceUndefinedToNull,ArrayPushDefined) plus listing/display/test parsing support. - Extends VM state with host-supplied evaluation context and cached metadata, and adds Azure Policy–specific loop behavior for
[*]over non-collections. - Refactors/extends program metadata (new module, language + annotations), adds
has_host_await, and bumps binary serialization version to 6 with new/updated YAML VM suites.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/rvm/vm/suites/type_errors.yaml | Updates logical NOT test case description/example metadata. |
| tests/rvm/vm/suites/return_undefined_if_not_true.yaml | New YAML suite for ReturnUndefinedIfNotTrue. |
| tests/rvm/vm/suites/load_context_metadata.yaml | New YAML suite for LoadContext / LoadMetadata. |
| tests/rvm/vm/suites/coalesce_undefined_to_null.yaml | New YAML suite for CoalesceUndefinedToNull and interactions. |
| src/rvm/vm/machine.rs | Adds VM context + metadata cache + Azure Policy loop flag; changes execution-timer tick logic. |
| src/rvm/vm/loops.rs | Implements Azure Policy non-collection [*] loop semantics via a virtual single element. |
| src/rvm/vm/dispatch.rs | Executes new instructions and adds ArrayPushDefined. |
| src/rvm/vm/context.rs | Extends IterationState with Single for virtual iteration. |
| src/rvm/vm/comprehension.rs | Ensures comprehensions tolerate IterationState::Single. |
| src/rvm/tests/vm.rs | Extends YAML harness with context + metadata fields and recomputes has_host_await. |
| src/rvm/tests/instruction_parser.rs | Adds parser support for new instruction mnemonics. |
| src/rvm/program/types.rs | Removes old ProgramMetadata definition (moved to new module). |
| src/rvm/program/serialization/json.rs | Serializes/deserializes expanded metadata, annotations, and has_host_await. |
| src/rvm/program/serialization/binary.rs | Bumps binary version handling and adjusts legacy detection. |
| src/rvm/program/mod.rs | Exposes new metadata module and re-exports ProgramMetadata. |
| src/rvm/program/metadata.rs | New metadata types and postcard-safe annotation serialization bridge. |
| src/rvm/program/listing.rs | Displays metadata language/annotations and renders new instructions. |
| src/rvm/program/core.rs | Adds has_host_await, bumps serialization version to 6, tracks HostAwait presence. |
| src/rvm/instructions/mod.rs | Defines new instruction variants and docs. |
| src/rvm/instructions/display.rs | Adds Display formatting for new instruction variants. |
| src/languages/rego/compiler/mod.rs | Minor import adjustment for Value. |
| src/languages/rego/compiler/core.rs | Tracks has_host_await when emitting HostAwait directly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| match version { | ||
| 1..=3 => { | ||
| 1..=5 => { | ||
| let mut program = Program::new(); | ||
| program.needs_recompilation = true; | ||
| program.rego_v0 = Self::legacy_rego_v0(data, version).unwrap_or(false); | ||
| Ok(DeserializationResult::Partial(program)) | ||
| } | ||
| 4 | 5 => { | ||
| 6 => { |
There was a problem hiding this comment.
Binary deserialization now treats versions 1..=5 as Partial (needs_recompilation), which means previously supported v5 artifacts will no longer fully deserialize. Since Program/ProgramMetadata added fields use serde defaults, this looks like an unintended backwards-compatibility break; consider keeping v4/v5 in the full-deserialization path and only special-casing truly unsupported versions.
| @@ -399,7 +426,7 @@ impl RegoVM { | |||
| }; | |||
|
|
|||
| self.execution_timer | |||
| .check_now(now) | |||
| .tick(work_units, now) | |||
| .map_err(|err| match err { | |||
There was a problem hiding this comment.
execution_timer_tick now calls monotonic_now() on every tick when a limit is configured, even though ExecutionTimer::tick only needs the clock when the check interval is reached. This defeats the amortization design and can add significant overhead; consider restoring the previous accumulate() gate (or adding an API to query whether a check is due before reading the clock).
src/rvm/vm/machine.rs
Outdated
| /// number, etc.) returns `false` instead of vacuous `true`. This matches | ||
| /// Azure Policy semantics where `field[*]` on a non-array means "condition | ||
| /// not met". Automatically set from `program.metadata.language`. |
There was a problem hiding this comment.
The doc comment says an Every loop over a non-collection "returns false", but the implementation creates a single virtual iteration and the result depends on the loop body outcome. Please update the comment to match the actual semantics (virtual single null element) or adjust the implementation if the intent is to force false unconditionally.
| /// number, etc.) returns `false` instead of vacuous `true`. This matches | |
| /// Azure Policy semantics where `field[*]` on a non-array means "condition | |
| /// not met". Automatically set from `program.metadata.language`. | |
| /// number, etc.) behaves as if iterating over a single virtual element whose | |
| /// value is `null`, instead of being vacuously `true` over an empty collection. | |
| /// This matches Azure Policy semantics where `field[*]` on a non-array produces | |
| /// a single `null` element (which typically causes the condition to evaluate to | |
| /// `false`). Automatically set from `program.metadata.language`. |
src/rvm/vm/loops.rs
Outdated
| /// In standard Rego, `Every` over empty is vacuously `true`. | ||
| /// In Azure Policy mode, this code path is only reached when | ||
| /// `virtual_element_on_non_collection` is false (standard Rego). |
There was a problem hiding this comment.
This comment is misleading: non_collection_result() is still used in Azure Policy mode for Any/ForEach (objects/non-collections short-circuit to false). Please update the wording to reflect when this helper is used in each mode.
| /// In standard Rego, `Every` over empty is vacuously `true`. | |
| /// In Azure Policy mode, this code path is only reached when | |
| /// `virtual_element_on_non_collection` is false (standard Rego). | |
| /// In standard Rego mode, this helper is used for all loop modes when the | |
| /// collection operand is not iterable: `Every` over empty is vacuously `true`, | |
| /// and `Any`/`ForEach` over empty are `false`. | |
| /// In Azure Policy mode, this helper is still used for `Any`/`ForEach` when an | |
| /// object or other non-collection is encountered and short-circuits to `false`; | |
| /// `Every` with virtual elements is handled via a different code path. |
src/rvm/vm/context.rs
Outdated
| /// Used by Azure Policy's `[*]` on scalar/null fields: iterates once | ||
| /// with the original value (typically Null/Undefined) as the element. |
There was a problem hiding this comment.
IterationState::Single is documented as iterating over the original scalar/null value, but setup_next_iteration currently always yields value = Null for this state. Either store the original element in IterationState::Single and return it, or update the doc comment to describe the current "virtual null element" behavior.
| /// Used by Azure Policy's `[*]` on scalar/null fields: iterates once | |
| /// with the original value (typically Null/Undefined) as the element. | |
| /// Used by Azure Policy's `[*]` on scalar/null fields: presents a single | |
| /// "virtual" element to iterate over, which is always `Null` regardless | |
| /// of the underlying source value. |
src/rvm/instructions/mod.rs
Outdated
| /// Return undefined immediately when condition is false or undefined. | ||
| /// | ||
| /// This is used by Azure Policy compilation to model "condition does not match" | ||
| /// without treating it as a VM assertion failure. | ||
| ReturnUndefinedIfNotTrue { | ||
| condition: u8, | ||
| }, |
There was a problem hiding this comment.
The instruction doc says this returns undefined when condition is false/undefined, but the dispatch implementation returns undefined for any value that is not exactly Bool(true) (including null, numbers, strings, etc.). Please update the doc to match the actual semantics so compiler authors don’t misuse it.
| let annotations: alloc::collections::BTreeMap<String, Value> = metadata | ||
| .get("annotations") | ||
| .and_then(|v| match *v { | ||
| // Deserialize JSON annotations into Value map | ||
| serde_json::Value::Object(ref map) => { | ||
| let mut result = alloc::collections::BTreeMap::new(); | ||
| for (k, json_val) in map { | ||
| if let Ok(val) = serde_json::from_value::<Value>(json_val.clone()) { | ||
| result.insert(k.clone(), val); | ||
| } | ||
| } | ||
| Some(result) | ||
| } | ||
| _ => None, | ||
| }) | ||
| .unwrap_or_default(); |
There was a problem hiding this comment.
Annotation deserialization silently drops entries when serde_json::from_value::<Value> fails. That can lead to hard-to-debug metadata loss; consider returning an error (fail deserialization) or preserving the key with a fallback representation instead of skipping it.
| let annotations: alloc::collections::BTreeMap<String, Value> = metadata | |
| .get("annotations") | |
| .and_then(|v| match *v { | |
| // Deserialize JSON annotations into Value map | |
| serde_json::Value::Object(ref map) => { | |
| let mut result = alloc::collections::BTreeMap::new(); | |
| for (k, json_val) in map { | |
| if let Ok(val) = serde_json::from_value::<Value>(json_val.clone()) { | |
| result.insert(k.clone(), val); | |
| } | |
| } | |
| Some(result) | |
| } | |
| _ => None, | |
| }) | |
| .unwrap_or_default(); | |
| let annotations: alloc::collections::BTreeMap<String, Value> = | |
| match metadata.get("annotations") { | |
| // Deserialize JSON annotations into Value map | |
| Some(serde_json::Value::Object(ref map)) => { | |
| let mut result = alloc::collections::BTreeMap::new(); | |
| for (k, json_val) in map { | |
| let val = serde_json::from_value::<Value>(json_val.clone()) | |
| .map_err(|e| { | |
| format!( | |
| "Annotation '{}' deserialization failed: {}", | |
| k, e | |
| ) | |
| })?; | |
| result.insert(k.clone(), val); | |
| } | |
| result | |
| } | |
| _ => alloc::collections::BTreeMap::new(), | |
| }; |
| /// # Lossy mappings | ||
| /// | ||
| /// [`Null`](crate::value::Value::Null) and [`Undefined`](crate::value::Value::Undefined) | ||
| /// are mapped to [`MetadataValue::String("")`](MetadataValue::String) because metadata | ||
| /// annotations have no need for null semantics. A round-trip through | ||
| /// `from_value` → `to_value` will therefore turn `Null`/`Undefined` into an | ||
| /// empty string. | ||
| /// | ||
| /// Floating-point numbers are truncated to `i64` because metadata annotations | ||
| /// are expected to contain only integer counts, flags, and identifiers — | ||
| /// fractional values are not anticipated. | ||
| #[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize)] |
There was a problem hiding this comment.
MetadataValue intentionally performs lossy conversions (Null/Undefined → empty string; floats truncated; non-i64 numbers become Integer(0)). Since this is used for binary serialization of program metadata, it can silently corrupt annotations across a save/load round-trip. Consider adding explicit Null/Undefined/Float variants (or rejecting unsupported values at compile time) to keep metadata serialization lossless and predictable.
tests/rvm/vm/suites/type_errors.yaml
Outdated
| description: NOT with non-boolean defined operand should yield false | ||
| example_rego: "not 42" | ||
| description: NOT with int operand treats non-boolean as truthy (result false) | ||
| example_rego: "!42" |
There was a problem hiding this comment.
The example_rego uses !42, but the Rego parser in this repo recognizes not (negation) and doesn’t appear to support ! as a unary operator. Consider switching the example back to not 42 (or another syntactically valid example) to avoid misleading readers.
| example_rego: "!42" | |
| example_rego: "not 42" |
| ArrayPushDefined { arr, value } => { | ||
| let value_to_push = self.get_register(value)?.clone(); | ||
|
|
||
| // Skip undefined values — matches Azure Policy's | ||
| // `field('alias[*].property')` collection semantics where | ||
| // absent nested properties are excluded from the collected | ||
| // array. | ||
| if value_to_push == Value::Undefined { | ||
| return Ok(InstructionOutcome::Continue); | ||
| } | ||
|
|
||
| let mut arr_value = self.take_register(arr)?; | ||
|
|
||
| if let Ok(arr_mut) = arr_value.as_array_mut() { | ||
| arr_mut.push(value_to_push); | ||
| self.set_register(arr, arr_value)?; | ||
| } else { | ||
| let offending = arr_value.clone(); | ||
| self.set_register(arr, arr_value)?; | ||
| return Err(VmError::RegisterNotArray { | ||
| register: arr, | ||
| value: offending, | ||
| pc: self.pc, | ||
| }); | ||
| } | ||
| Ok(InstructionOutcome::Continue) |
There was a problem hiding this comment.
ArrayPushDefined introduces new behavior (skipping Undefined) but there’s no YAML VM suite coverage for it yet, while similar array ops have tests. Adding a small suite that covers (1) pushing Undefined is skipped, (2) pushing Null/value is kept, and (3) non-array target still errors would help prevent regressions.
bd0edc1 to
c6110cc
Compare
…ization v6 Extend the RVM infrastructure with evaluation context, program metadata, and several new instructions needed by language frontends beyond Rego. VM & instructions: - Add LoadContext / LoadMetadata instructions for accessing evaluation context and program metadata from within RVM programs - Add ArrayPushDefined (skip-undefined variant of ArrayPush) for wildcard alias collection where absent properties should be excluded - Add ReturnUndefinedIfNotTrue for early-return semantics without triggering VM assertion failures - Add CoalesceUndefinedToNull for languages where missing fields are null rather than undefined - Support IterationState::Single in loops and comprehensions for iterating over scalar values alongside collections Program metadata: - Extend ProgramMetadata with language identifier, typed annotations (BTreeMap<String, MetadataValue>), and to_value() conversion - Add MetadataValue enum (String, Bool, Integer, Float, Array, Object) with full serde support - Add has_host_await flag to Program with recompute_host_await_presence() Serialization: - Bump binary format version to 6 - Add JSON serialization for ProgramMetadata including annotations Compiler: - Track has_host_await during compilation - Minor import cleanups Test infrastructure: - Extend VM test harness with context, metadata_language, and metadata_annotations fields - Add instruction parser support for new instructions - Add assembly listing support (readable, tabular, compact formats) - Add 3 new YAML test suites: load_context_metadata, coalesce_undefined_to_null, return_undefined_if_not_true
c6110cc to
9247770
Compare
No description provided.