perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path by He-Pin · Pull Request #686 · databricks/sjsonnet

He-Pin · 2026-04-05T04:25:27Z

Motivation

Array comprehensions with simple binary operations (e.g., [a + b for a in xs for b in ys]) are extremely common in Jsonnet workloads. The current implementation creates a new ValScope per iteration and goes through the full visitExpr dispatch for every element — both of which are unnecessary overhead for the common case where the body is BinaryOp(ValidId, ValidId).

This PR fuses scope creation with body evaluation and inlines the BinaryOp evaluation for ValidId operands directly in the comprehension inner loop, avoiding:

Scope allocation per iteration (reuses a mutable scope)
Full visitExpr dispatch (direct bindings(idx).value lookup)
Lazy thunk creation (eager Val results instead of LazyExpr wrappers)

Key Design Decisions

Mutable scope reuse: ValScope.extendMutable() creates a scope with one uninitialized slot. The caller mutates bindings(slot) per iteration instead of allocating a new scope. This is safe because BinaryOp(ValidId, ValidId) evaluates both operands eagerly — no lazy closures capture the scope array.
Complete operator coverage: The inline fast path handles ALL BinaryOp operations:
- evalBinaryOpNumNum: @switch-dispatched Num×Num path with overflow checks, div-by-zero, and bitwise ops
- visitBinaryOpValues: Polymorphic fallback for Str concat, Str format, Obj merge, Arr concat, equality, comparisons, and in
Eager evaluation: The fast path stores concrete Val results instead of LazyExpr thunks. This changes error timing for edge cases like std.length([1/0 for x in [1]]) (errors at construction vs access), but matches go-jsonnet/jrsonnet behavior and provides massive performance gains.

Modification

`Evaluator.scala`

visitCompInline: New method that fuses scope creation with body evaluation. For BinaryOp(ValidId, ValidId) bodies, directly looks up operands via scope indices and dispatches to specialized evaluators.
evalBinaryOpNumNum: @inline @switch-dispatched fast path for Num×Num operations (comparison, arithmetic with overflow checks, bitwise with safe integer range checks).
visitBinaryOpValues: Complete polymorphic fallback handling all non-Num operations (Str+Str, Str+any, Obj+Obj, Arr+Arr, Str%any, comparisons).

`ValScope.scala`

extendMutable(): New method that creates a mutable scope extension for tight-loop reuse without per-iteration allocation.

Test

comprehension_binop_types.jsonnet: Comprehensive regression test covering ALL BinaryOp types in comprehensions (string concat, numeric arithmetic, comparisons, bitwise ops, string formatting, array concat, in operator).

Benchmark Results

JMH (JVM, Scala 3.3.7)

Benchmark	Before (ms/op)	After (ms/op)	Change
comparison2	72.498	39.086	-46.1% ✅
comparison	23.093	22.120	-4.2%
bench.02	48.296	48.670	~0% (noise)
bench.04	33.261	32.260	-3.0%
realistic2	68.960	69.977	~0% (noise)
reverse	10.685	10.540	~0% (noise)

No regressions across all 35 benchmarks.

Hyperfine (Scala Native, macOS ARM64)

Benchmark	master	This PR	jrsonnet 0.4.2	vs master	vs jrsonnet
comparison2	166.5 ms	74.0 ms	232.4 ms	-55.6%	3.14× faster ✅
bench.02	68.8 ms	67.5 ms	—	~0%	—
bench.04	530 ms	528 ms	532 ms	~0%	~0%
reverse	49.6 ms	49.1 ms	—	~0%	—

Analysis

The -46% JMH / -56% native improvement on comparison2 is expected: this benchmark is dominated by [a < b for a in large_array for b in large_array] comprehensions, which are exactly the pattern optimized by the BinaryOp inline fast path.

Other benchmarks show no regression because:

The fast path only activates for BinaryOp(ValidId, ValidId) bodies — other comprehension patterns fall through to the normal path
The @switch annotation ensures tableswitch bytecode for zero-overhead dispatch

sjsonnet is now 3.14× faster than jrsonnet on comparison2, up from being roughly equal.

References

Upstream jit branch: 71545ba8 (Comprehension inline BinaryOp)
Upstream jit branch: 3466461a (Comprehension fuse scope+eval)
Supersedes: PR perf: fuse comprehension scope building and body evaluation #675 (comprehension-fuse, which only had the scope fusion without BinaryOp inlining)

Result

✅ All 141 tests pass
✅ Zero benchmark regressions across 35 JMH benchmarks
✅ comparison2: -46% JMH, -56% native, 3.14× faster than jrsonnet
✅ Comprehensive regression test for all BinaryOp types in comprehensions

…Id) fast path Fuse comprehension scope building with body evaluation, eliminating redundant scope allocation in the innermost loop. When the body is BinaryOp(ValidId,ValidId), inline the scope lookups and binary-op dispatch entirely, avoiding 3× visitExpr overhead per iteration. Key changes: - ValScope.extendMutable(): creates a scope with one extra mutable slot for reuse across iterations (safe because results are eagerly evaluated, not captured in lazy thunks) - visitCompInline: split by rest (Nil vs non-Nil), with BinaryOp fast path for innermost loops - evalBinaryOpNumNum: @switch-dispatched Num×Num fast path covering all comparison, arithmetic, modulo, bitwise, and shift operators with full safety checks (overflow, division-by-zero, safe integer range) - visitBinaryOpValues: polymorphic fallback for non-Num operands covering string concat/format, object merge, array concat, equality, and 'in' Benchmark: comparison2 -53.1% (74.1 → 34.8 ms/op), zero regressions across 35 benchmarks. Upstream: jit branch commits 3466461 (fuse) + 71545ba (inline)

He-Pin mentioned this pull request Apr 5, 2026

performance optimization #666

Open

He-Pin marked this pull request as ready for review April 5, 2026 09:44

He-Pin mentioned this pull request Apr 5, 2026

perf: optimize std.range allocation and add staticNull singleton #669

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path#686

perf: comprehension fuse scope+eval and inline BinaryOp(ValidId,ValidId) fast path#686
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/comprehension-binop-inline

He-Pin commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 5, 2026

Motivation

Key Design Decisions

Modification

Evaluator.scala

ValScope.scala

Test

Benchmark Results

JMH (JVM, Scala 3.3.7)

Hyperfine (Scala Native, macOS ARM64)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`Evaluator.scala`

`ValScope.scala`