perf: Materializer inline fast-path for object materialization by He-Pin · Pull Request #690 · databricks/sjsonnet

He-Pin · 2026-04-05T09:50:38Z

Motivation

Object materialization is the dominant bottleneck in workloads with many small objects. In the realistic2 benchmark, 96% of total time (231.8ms of 241.1ms) is spent in materialization, processing ~62K comprehension-generated objects with 2-9 fields each.

The hot path involves:

visibleKeyNames — allocates a sorted key array via LinkedHashMap.keySet
value(key) — HashMap lookup for each field
materializeRecursiveChild — recursive descent per field

For inline objects (no super, no excludedKeys), all field data is already stored in direct arrays — we can bypass the HashMap entirely.

Key Design Decision

Two-tier caching strategy for sorted field order:

Static field names (FieldName.Fixed): Cache sorted order on the MemberList AST node, shared across all Val.Obj instances created from the same source location. This is safe because fixed field names are compile-time string literals.
Dynamic field names (FieldName.Dyn): Compute sorted order per-object instance, with no AST-level caching. Dynamic field names can vary across evaluations of the same AST node (e.g., {[x]: 1, a: 2} in a comprehension), so sharing a cached order would produce wrong output.

This distinction is critical — caching dynamic field names would produce wrong output.

Modification

Val.scala

Added canDirectIterate accessor: checks super==null && excludedKeys==null && inline storage present
Added inlineKeys, inlineMembers, singleKey, singleMem accessors to expose private constructor params
Added @volatile var _sortedInlineOrder: Array[Int] for per-instance cached sort order

Expr.scala

Added @volatile var _cachedSortedOrder: Array[Int] to ObjBody.MemberList for AST-level cache

Evaluator.scala

After creating inline objects in visitMemberList, compute and cache sorted field order
allFieldsFixed guard ensures MemberList cache is only used when all fields are FieldName.Fixed
For dynamic fields, sorted order is computed per-object without MemberList caching

Materializer.scala

materializeRecursiveObj now checks canDirectIterate and dispatches to fast path
New materializeInlineObj — unsorted direct iteration via m.invoke(obj, null, fs, evaluator)
New materializeSortedInlineObj — uses cached _sortedInlineOrder or computes on-the-fly
New computeSortedInlineOrder companion helper — insertion sort (optimal for 2-8 fields), filters hidden fields

Tests

MaterializerTests.scala — 8 unit tests for computeSortedInlineOrder (basic sorting, single field, already sorted, reverse, hidden fields, all hidden, unicode codepoint ordering, stability)
dynamic_field_sorted_order.jsonnet — golden file regression test for dynamic field name sorting correctness
dynamic_null_field_sorted_order.jsonnet — golden file regression test for conditional (nullable) field names

Benchmark Results

JMH (ms/op, lower is better)

Benchmark	Master	Optimized	Δ%
realistic2	70.3	60.6	-13.8%
bench.02	46.7	38.5	-17.6%
comparison2	72.6	69.6	-4.2%
comparison	22.6	22.3	-1.3%

Full 35-benchmark regression suite: zero regressions.

Hyperfine Native Binary (ms, lower is better)

Benchmark	Master	Optimized	jrsonnet 0.5.0-pre98	vs Master	vs jrsonnet
realistic2	297.2	203.2	98.4	-31.6%	2.07x (was 3.02x)
bench.02	74.4	70.5	117.1	-5.2%	1.66x faster ✅
realistic1	123.7	124.1	—	+0.3% (noise)	—
comparison2	185.3	183.8	—	-0.8% (noise)	—

Analysis

realistic2 shows the strongest improvement (−31.6% native, −13.8% JMH) because it generates ~62K small objects where the HashMap bypass saves significant overhead
bench.02 improves because its 2-field objects hit the single-field + multi-field fast paths
The allFieldsFixed guard adds negligible overhead (~5-20 isInstanceOf checks per object creation, which is dwarfed by field evaluation time)
Thread safety maintained via @volatile on both cache fields — benign race (two threads may compute same result)

References

Upstream: jit branch commits 5f7abec3, dd9d08a3, 119b9a93

Result

✅ All tests pass (25 tests across 5 suites)
✅ Zero benchmark regressions
✅ realistic2 gap with jrsonnet narrowed from 3.02x → 2.07x
✅ bench.02 remains faster than jrsonnet (1.66x)
✅ Dynamic field name correctness verified with golden file regression tests

…ation For objects with exactly one field (common in patterns like `{ n: X }`), store the field key and member inline in Val.Obj instead of allocating a LinkedHashMap. The LinkedHashMap is lazily constructed only when needed (e.g., key iteration via getAllKeys). Key changes: - Val.Obj: added singleFieldKey/singleFieldMember constructor params - getValue0: lazily constructs LinkedHashMap from inline storage - valueRaw: single-field fast path with String.equals instead of HashMap.get - hasKeys/containsKey: fast paths to avoid forcing LinkedHashMap materialization - visitMemberList: lazy builder allocation, only for 2+ field objects Upstream: jit branch d284ecf (single-field object avoid LinkedHashMap)

Three-tier object storage: 1 field uses singleKey/singleMember, 2-8 fields use flat parallel arrays (inlineFieldKeys/inlineFieldMembers), 9+ fields use LinkedHashMap. This eliminates LinkedHashMap allocation for the vast majority of Jsonnet objects which have fewer than 9 fields. All fast paths updated: getValue0, hasKeys, containsKey, containsVisibleKey, allKeyNames, visibleKeyNames, valueRaw. Field tracking logic extracted into trackField() helper to avoid code duplication between the two Member.Field case branches. JMH: bench.02 -17.9%, realistic2 -2.7%, bench.04 -5.5% Native: realistic2 -13.5% (1.89x faster than jrsonnet) Upstream: jit branch commit 13e6ff3

Bypass HashMap value() lookups for inline objects (single-field and multi-field with array storage) during materialization. This targets the critical bottleneck where 96% of realistic2 time is spent in materialization (~62K comprehension-generated objects with 2-9 fields). Key changes: - Add canDirectIterate/inlineKeys/inlineMembers accessors to Val.Obj - Add materializeInlineObj (unsorted) and materializeSortedInlineObj fast paths that invoke members directly without HashMap lookup - Cache sorted field order on MemberList AST node for static field names (shared across all Val.Obj instances from same AST) - For dynamic field names (FieldName.Dyn), compute sorted order per-object to avoid cache correctness issues - Add computeSortedInlineOrder companion helper using insertion sort (optimal for typical 2-8 field objects) Upstream: jit branch commits 5f7abec, dd9d08a, 119b9a9

He-Pin added 3 commits April 5, 2026 12:57

He-Pin marked this pull request as ready for review April 5, 2026 10:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Materializer inline fast-path for object materialization#690

perf: Materializer inline fast-path for object materialization#690
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/materializer-inline-fastpath

He-Pin commented Apr 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Key Design Decision

Modification

Val.scala

Expr.scala

Evaluator.scala

Materializer.scala

Tests

Benchmark Results

JMH (ms/op, lower is better)

Hyperfine Native Binary (ms, lower is better)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

He-Pin commented Apr 5, 2026 •

edited

Loading