Skip to content

perf: Materializer inline fast-path for object materialization#690

Open
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/materializer-inline-fastpath
Open

perf: Materializer inline fast-path for object materialization#690
He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin:perf/materializer-inline-fastpath

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 5, 2026

Motivation

Object materialization is the dominant bottleneck in workloads with many small objects. In the realistic2 benchmark, 96% of total time (231.8ms of 241.1ms) is spent in materialization, processing ~62K comprehension-generated objects with 2-9 fields each.

The hot path involves:

  1. visibleKeyNames — allocates a sorted key array via LinkedHashMap.keySet
  2. value(key) — HashMap lookup for each field
  3. materializeRecursiveChild — recursive descent per field

For inline objects (no super, no excludedKeys), all field data is already stored in direct arrays — we can bypass the HashMap entirely.

Key Design Decision

Two-tier caching strategy for sorted field order:

  • Static field names (FieldName.Fixed): Cache sorted order on the MemberList AST node, shared across all Val.Obj instances created from the same source location. This is safe because fixed field names are compile-time string literals.
  • Dynamic field names (FieldName.Dyn): Compute sorted order per-object instance, with no AST-level caching. Dynamic field names can vary across evaluations of the same AST node (e.g., {[x]: 1, a: 2} in a comprehension), so sharing a cached order would produce wrong output.

This distinction is critical — caching dynamic field names would produce wrong output.

Modification

Val.scala

  • Added canDirectIterate accessor: checks super==null && excludedKeys==null && inline storage present
  • Added inlineKeys, inlineMembers, singleKey, singleMem accessors to expose private constructor params
  • Added @volatile var _sortedInlineOrder: Array[Int] for per-instance cached sort order

Expr.scala

  • Added @volatile var _cachedSortedOrder: Array[Int] to ObjBody.MemberList for AST-level cache

Evaluator.scala

  • After creating inline objects in visitMemberList, compute and cache sorted field order
  • allFieldsFixed guard ensures MemberList cache is only used when all fields are FieldName.Fixed
  • For dynamic fields, sorted order is computed per-object without MemberList caching

Materializer.scala

  • materializeRecursiveObj now checks canDirectIterate and dispatches to fast path
  • New materializeInlineObj — unsorted direct iteration via m.invoke(obj, null, fs, evaluator)
  • New materializeSortedInlineObj — uses cached _sortedInlineOrder or computes on-the-fly
  • New computeSortedInlineOrder companion helper — insertion sort (optimal for 2-8 fields), filters hidden fields

Tests

  • MaterializerTests.scala — 8 unit tests for computeSortedInlineOrder (basic sorting, single field, already sorted, reverse, hidden fields, all hidden, unicode codepoint ordering, stability)
  • dynamic_field_sorted_order.jsonnet — golden file regression test for dynamic field name sorting correctness
  • dynamic_null_field_sorted_order.jsonnet — golden file regression test for conditional (nullable) field names

Benchmark Results

JMH (ms/op, lower is better)

Benchmark Master Optimized Δ%
realistic2 70.3 60.6 -13.8%
bench.02 46.7 38.5 -17.6%
comparison2 72.6 69.6 -4.2%
comparison 22.6 22.3 -1.3%

Full 35-benchmark regression suite: zero regressions.

Hyperfine Native Binary (ms, lower is better)

Benchmark Master Optimized jrsonnet 0.5.0-pre98 vs Master vs jrsonnet
realistic2 297.2 203.2 98.4 -31.6% 2.07x (was 3.02x)
bench.02 74.4 70.5 117.1 -5.2% 1.66x faster
realistic1 123.7 124.1 +0.3% (noise)
comparison2 185.3 183.8 -0.8% (noise)

Analysis

  • realistic2 shows the strongest improvement (−31.6% native, −13.8% JMH) because it generates ~62K small objects where the HashMap bypass saves significant overhead
  • bench.02 improves because its 2-field objects hit the single-field + multi-field fast paths
  • The allFieldsFixed guard adds negligible overhead (~5-20 isInstanceOf checks per object creation, which is dwarfed by field evaluation time)
  • Thread safety maintained via @volatile on both cache fields — benign race (two threads may compute same result)

References

Result

  • ✅ All tests pass (25 tests across 5 suites)
  • ✅ Zero benchmark regressions
  • ✅ realistic2 gap with jrsonnet narrowed from 3.02x → 2.07x
  • ✅ bench.02 remains faster than jrsonnet (1.66x)
  • ✅ Dynamic field name correctness verified with golden file regression tests

He-Pin added 3 commits April 5, 2026 12:57
…ation

For objects with exactly one field (common in patterns like `{ n: X }`),
store the field key and member inline in Val.Obj instead of allocating a
LinkedHashMap. The LinkedHashMap is lazily constructed only when needed
(e.g., key iteration via getAllKeys).

Key changes:
- Val.Obj: added singleFieldKey/singleFieldMember constructor params
- getValue0: lazily constructs LinkedHashMap from inline storage
- valueRaw: single-field fast path with String.equals instead of HashMap.get
- hasKeys/containsKey: fast paths to avoid forcing LinkedHashMap materialization
- visitMemberList: lazy builder allocation, only for 2+ field objects

Upstream: jit branch d284ecf (single-field object avoid LinkedHashMap)
Three-tier object storage: 1 field uses singleKey/singleMember,
2-8 fields use flat parallel arrays (inlineFieldKeys/inlineFieldMembers),
9+ fields use LinkedHashMap. This eliminates LinkedHashMap allocation for
the vast majority of Jsonnet objects which have fewer than 9 fields.

All fast paths updated: getValue0, hasKeys, containsKey,
containsVisibleKey, allKeyNames, visibleKeyNames, valueRaw.

Field tracking logic extracted into trackField() helper to avoid
code duplication between the two Member.Field case branches.

JMH: bench.02 -17.9%, realistic2 -2.7%, bench.04 -5.5%
Native: realistic2 -13.5% (1.89x faster than jrsonnet)

Upstream: jit branch commit 13e6ff3
Bypass HashMap value() lookups for inline objects (single-field and
multi-field with array storage) during materialization. This targets
the critical bottleneck where 96% of realistic2 time is spent in
materialization (~62K comprehension-generated objects with 2-9 fields).

Key changes:
- Add canDirectIterate/inlineKeys/inlineMembers accessors to Val.Obj
- Add materializeInlineObj (unsorted) and materializeSortedInlineObj
  fast paths that invoke members directly without HashMap lookup
- Cache sorted field order on MemberList AST node for static field
  names (shared across all Val.Obj instances from same AST)
- For dynamic field names (FieldName.Dyn), compute sorted order
  per-object to avoid cache correctness issues
- Add computeSortedInlineOrder companion helper using insertion sort
  (optimal for typical 2-8 field objects)

Upstream: jit branch commits 5f7abec, dd9d08a, 119b9a9
@He-Pin He-Pin marked this pull request as ready for review April 5, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant