perf: Materializer inline fast-path for object materialization#690
Open
He-Pin wants to merge 3 commits intodatabricks:masterfrom
Open
perf: Materializer inline fast-path for object materialization#690He-Pin wants to merge 3 commits intodatabricks:masterfrom
He-Pin wants to merge 3 commits intodatabricks:masterfrom
Conversation
…ation
For objects with exactly one field (common in patterns like `{ n: X }`),
store the field key and member inline in Val.Obj instead of allocating a
LinkedHashMap. The LinkedHashMap is lazily constructed only when needed
(e.g., key iteration via getAllKeys).
Key changes:
- Val.Obj: added singleFieldKey/singleFieldMember constructor params
- getValue0: lazily constructs LinkedHashMap from inline storage
- valueRaw: single-field fast path with String.equals instead of HashMap.get
- hasKeys/containsKey: fast paths to avoid forcing LinkedHashMap materialization
- visitMemberList: lazy builder allocation, only for 2+ field objects
Upstream: jit branch d284ecf (single-field object avoid LinkedHashMap)
Three-tier object storage: 1 field uses singleKey/singleMember, 2-8 fields use flat parallel arrays (inlineFieldKeys/inlineFieldMembers), 9+ fields use LinkedHashMap. This eliminates LinkedHashMap allocation for the vast majority of Jsonnet objects which have fewer than 9 fields. All fast paths updated: getValue0, hasKeys, containsKey, containsVisibleKey, allKeyNames, visibleKeyNames, valueRaw. Field tracking logic extracted into trackField() helper to avoid code duplication between the two Member.Field case branches. JMH: bench.02 -17.9%, realistic2 -2.7%, bench.04 -5.5% Native: realistic2 -13.5% (1.89x faster than jrsonnet) Upstream: jit branch commit 13e6ff3
Bypass HashMap value() lookups for inline objects (single-field and multi-field with array storage) during materialization. This targets the critical bottleneck where 96% of realistic2 time is spent in materialization (~62K comprehension-generated objects with 2-9 fields). Key changes: - Add canDirectIterate/inlineKeys/inlineMembers accessors to Val.Obj - Add materializeInlineObj (unsorted) and materializeSortedInlineObj fast paths that invoke members directly without HashMap lookup - Cache sorted field order on MemberList AST node for static field names (shared across all Val.Obj instances from same AST) - For dynamic field names (FieldName.Dyn), compute sorted order per-object to avoid cache correctness issues - Add computeSortedInlineOrder companion helper using insertion sort (optimal for typical 2-8 field objects) Upstream: jit branch commits 5f7abec, dd9d08a, 119b9a9
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Object materialization is the dominant bottleneck in workloads with many small objects. In the
realistic2benchmark, 96% of total time (231.8ms of 241.1ms) is spent in materialization, processing ~62K comprehension-generated objects with 2-9 fields each.The hot path involves:
visibleKeyNames— allocates a sorted key array viaLinkedHashMap.keySetvalue(key)— HashMap lookup for each fieldmaterializeRecursiveChild— recursive descent per fieldFor inline objects (no
super, noexcludedKeys), all field data is already stored in direct arrays — we can bypass the HashMap entirely.Key Design Decision
Two-tier caching strategy for sorted field order:
FieldName.Fixed): Cache sorted order on theMemberListAST node, shared across allVal.Objinstances created from the same source location. This is safe because fixed field names are compile-time string literals.FieldName.Dyn): Compute sorted order per-object instance, with no AST-level caching. Dynamic field names can vary across evaluations of the same AST node (e.g.,{[x]: 1, a: 2}in a comprehension), so sharing a cached order would produce wrong output.This distinction is critical — caching dynamic field names would produce wrong output.
Modification
Val.scala
canDirectIterateaccessor: checkssuper==null && excludedKeys==null && inline storage presentinlineKeys,inlineMembers,singleKey,singleMemaccessors to expose private constructor params@volatile var _sortedInlineOrder: Array[Int]for per-instance cached sort orderExpr.scala
@volatile var _cachedSortedOrder: Array[Int]toObjBody.MemberListfor AST-level cacheEvaluator.scala
visitMemberList, compute and cache sorted field orderallFieldsFixedguard ensures MemberList cache is only used when all fields areFieldName.FixedMaterializer.scala
materializeRecursiveObjnow checkscanDirectIterateand dispatches to fast pathmaterializeInlineObj— unsorted direct iteration viam.invoke(obj, null, fs, evaluator)materializeSortedInlineObj— uses cached_sortedInlineOrderor computes on-the-flycomputeSortedInlineOrdercompanion helper — insertion sort (optimal for 2-8 fields), filters hidden fieldsTests
MaterializerTests.scala— 8 unit tests forcomputeSortedInlineOrder(basic sorting, single field, already sorted, reverse, hidden fields, all hidden, unicode codepoint ordering, stability)dynamic_field_sorted_order.jsonnet— golden file regression test for dynamic field name sorting correctnessdynamic_null_field_sorted_order.jsonnet— golden file regression test for conditional (nullable) field namesBenchmark Results
JMH (ms/op, lower is better)
Full 35-benchmark regression suite: zero regressions.
Hyperfine Native Binary (ms, lower is better)
Analysis
allFieldsFixedguard adds negligible overhead (~5-20isInstanceOfchecks per object creation, which is dwarfed by field evaluation time)@volatileon both cache fields — benign race (two threads may compute same result)References
5f7abec3,dd9d08a3,119b9a93Result