perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path#686
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path#686He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
…Id) fast path Fuse comprehension scope building with body evaluation, eliminating redundant scope allocation in the innermost loop. When the body is BinaryOp(ValidId,ValidId), inline the scope lookups and binary-op dispatch entirely, avoiding 3× visitExpr overhead per iteration. Key changes: - ValScope.extendMutable(): creates a scope with one extra mutable slot for reuse across iterations (safe because results are eagerly evaluated, not captured in lazy thunks) - visitCompInline: split by rest (Nil vs non-Nil), with BinaryOp fast path for innermost loops - evalBinaryOpNumNum: @switch-dispatched Num×Num fast path covering all comparison, arithmetic, modulo, bitwise, and shift operators with full safety checks (overflow, division-by-zero, safe integer range) - visitBinaryOpValues: polymorphic fallback for non-Num operands covering string concat/format, object merge, array concat, equality, and 'in' Benchmark: comparison2 -53.1% (74.1 → 34.8 ms/op), zero regressions across 35 benchmarks. Upstream: jit branch commits 3466461 (fuse) + 71545ba (inline)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Array comprehensions with simple binary operations (e.g.,
[a + b for a in xs for b in ys]) are extremely common in Jsonnet workloads. The current implementation creates a newValScopeper iteration and goes through the fullvisitExprdispatch for every element — both of which are unnecessary overhead for the common case where the body isBinaryOp(ValidId, ValidId).This PR fuses scope creation with body evaluation and inlines the BinaryOp evaluation for
ValidIdoperands directly in the comprehension inner loop, avoiding:visitExprdispatch (directbindings(idx).valuelookup)Valresults instead ofLazyExprwrappers)Key Design Decisions
Mutable scope reuse:
ValScope.extendMutable()creates a scope with one uninitialized slot. The caller mutatesbindings(slot)per iteration instead of allocating a new scope. This is safe becauseBinaryOp(ValidId, ValidId)evaluates both operands eagerly — no lazy closures capture the scope array.Complete operator coverage: The inline fast path handles ALL
BinaryOpoperations:evalBinaryOpNumNum:@switch-dispatched Num×Num path with overflow checks, div-by-zero, and bitwise opsvisitBinaryOpValues: Polymorphic fallback for Str concat, Str format, Obj merge, Arr concat, equality, comparisons, andinEager evaluation: The fast path stores concrete
Valresults instead ofLazyExprthunks. This changes error timing for edge cases likestd.length([1/0 for x in [1]])(errors at construction vs access), but matches go-jsonnet/jrsonnet behavior and provides massive performance gains.Modification
Evaluator.scalavisitCompInline: New method that fuses scope creation with body evaluation. ForBinaryOp(ValidId, ValidId)bodies, directly looks up operands via scope indices and dispatches to specialized evaluators.evalBinaryOpNumNum:@inline @switch-dispatched fast path for Num×Num operations (comparison, arithmetic with overflow checks, bitwise with safe integer range checks).visitBinaryOpValues: Complete polymorphic fallback handling all non-Num operations (Str+Str, Str+any, Obj+Obj, Arr+Arr, Str%any, comparisons).ValScope.scalaextendMutable(): New method that creates a mutable scope extension for tight-loop reuse without per-iteration allocation.Test
comprehension_binop_types.jsonnet: Comprehensive regression test covering ALL BinaryOp types in comprehensions (string concat, numeric arithmetic, comparisons, bitwise ops, string formatting, array concat,inoperator).Benchmark Results
JMH (JVM, Scala 3.3.7)
No regressions across all 35 benchmarks.
Hyperfine (Scala Native, macOS ARM64)
Analysis
The
-46% JMH / -56% nativeimprovement oncomparison2is expected: this benchmark is dominated by[a < b for a in large_array for b in large_array]comprehensions, which are exactly the pattern optimized by the BinaryOp inline fast path.Other benchmarks show no regression because:
BinaryOp(ValidId, ValidId)bodies — other comprehension patterns fall through to the normal path@switchannotation ensures tableswitch bytecode for zero-overhead dispatchsjsonnet is now 3.14× faster than jrsonnet on comparison2, up from being roughly equal.
References
71545ba8(Comprehension inline BinaryOp)3466461a(Comprehension fuse scope+eval)Result