Skip to content

perf: abstract FormatCache as pluggable trait, optimize format runtime#679

Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/format-runtime-optimization
Open

perf: abstract FormatCache as pluggable trait, optimize format runtime#679
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/format-runtime-optimization

Conversation

@He-Pin
Copy link
Copy Markdown
Contributor

@He-Pin He-Pin commented Apr 5, 2026

Motivation

The % format operator is a critical path for string-template-heavy workloads. The current implementation:

  1. Re-parses format strings on every invocation (fastparse overhead)
  2. Materializes all values to ujson.Value before format-specific dispatch (allocates intermediate objects)
  3. Uses for/zipWithIndex iterator allocation in the main loop
  4. Always allocates BigInt for integer formatting even when the value fits in a Long
  5. Calls widen() even when no width/padding is needed
  6. Uses a static cache field, preventing users from plugging in custom cache implementations

These costs compound in benchmarks like large_string_template (256 format specs in a 600KB template).

Key Design Decision

Pluggable FormatCache trait: Abstracts the format string cache as a trait (analogous to ParseCache), injected through InterpreterEvaluator constructors. Users can supply custom implementations (e.g., Caffeine-based) for better control over eviction, concurrency, and memory. Default is a process-wide LRU singleton (FormatCache.SharedDefault).

Direct Val dispatch: Match on Val.Str, Val.Num, Val.True, Val.False, Val.Null directly instead of materializing to ujson.Value first. Since Val is a sealed class, this covers all primitive types. Complex types (Arr, Obj) still go through Materializer.

Long fast path: formatInteger avoids BigInt allocation when the value fits in a Long (with explicit Long.MinValue guard to prevent negation overflow).

Modification

New file: sjsonnet/src/sjsonnet/FormatCache.scala

  • FormatCache trait: Single getOrElseUpdate(key, compute) API
  • DefaultFormatCache: LRU LinkedHashMap (256 entries, access-order), thread-safe via synchronized double-checked locking. Initial capacity sized to avoid premature rehash.
  • FormatCache.SharedDefault: Process-wide singleton preserving cross-interpreter reuse
  • FormatCache.EmptyCache: Always-recompute cache for testing

Modified: sjsonnet/src/sjsonnet/Format.scala

  • Removed static parsedFormatCache field → replaced by pluggable FormatCache
  • CompiledFormat sealed trait: Opaque marker for cache entries (hides RuntimeFormat internals)
  • RuntimeFormat: Pre-processes parsed format into arrays with metadata (hasAnyStar, staticChars), now private[sjsonnet] extending CompiledFormat
  • parseFormatCached: Takes FormatCache parameter, uses pattern match (not asInstanceOf)
  • Direct Val dispatch: Bypasses Materializer for Str/Num/Bool/Null
  • widenRaw fast path: Returns txt directly when width.isEmpty
  • While-loop: Replaces for/zipWithIndex to avoid iterator/tuple allocation
  • StringBuilder pre-sizing: Estimates capacity from staticChars + specs.length * 8
  • formatInteger Long fast path: Uses java.lang.Long.toString instead of BigInt.toString
  • PartialApplyFmt: Pre-parses at construction time, bypasses external cache

Modified: sjsonnet/src/sjsonnet/Val.scala

  • EvalScope.formatCache: Concrete method with default (FormatCache.SharedDefault), avoids breaking external implementations

Modified: sjsonnet/src/sjsonnet/Evaluator.scala

  • Constructor parameter: formatCache: FormatCache = FormatCache.SharedDefault added to both Evaluator and NewEvaluator

Modified: sjsonnet/src/sjsonnet/Interpreter.scala

  • Constructor parameter: formatCache: FormatCache threaded through to createEvaluator

Modified: sjsonnet/src-jvm-native/sjsonnet/SjsonnetMainBase.scala

  • Passes FormatCache.SharedDefault explicitly to Interpreter constructor

Benchmark Results

JMH Regression Suite (1 fork, 5 warmup, 5 measurement iterations)

Benchmark Master (ms/op) This PR (ms/op) Change
large_string_template 2.265 2.121 -6.4%
realistic1 2.714 2.315 -14.7%
realistic2 70.491 75.059 +6.5% (noise)
All other benchmarks - - Within ±3% noise

Scala Native Hyperfine (-N -w4 -m20)

Benchmark Master (ms) This PR (ms) jrsonnet 0.5.0-pre98 (ms) Gap
large_string_template 17.4 17.0 8.5 2.01x (was 2.06x)
bench.04 505.3 520.2 571.4 sjsonnet faster ✅
comparison2 170.1 168.3 239.1 sjsonnet faster ✅

No regressions on non-format benchmarks.

Analysis

The JVM improvement (-6.4% on large_string_template, -14.7% on realistic1) confirms the optimization is effective. The native improvement is more modest because:

  1. The format cache helps JMH (repeated invocations) more than native/hyperfine (one invocation per process)
  2. JVM JIT can better exploit the direct Val dispatch due to runtime specialization
  3. The large_string_template benchmark has a 600KB format string with only 256 specs — the bottleneck is dominated by string I/O rather than format logic

The Long.MinValue edge case is guarded to prevent negation overflow — falls through to the BigDecimal path.

The FormatCache abstraction adds no performance overhead — the SharedDefault singleton is identical to the previous static cache, and PartialApplyFmt bypasses the cache entirely.

Supersedes PR #672 (format-parse-cache) which only included the caching part.

References

  • Upstream jit branch commit e98cd1f8 — Format chunk runtime optimization
  • Upstream jit branch commit 6524d77d — Direct Val dispatch, Long fast path

Result

Positive performance impact on format-heavy workloads. FormatCache now pluggable like ParseCache. All 140 tests pass. 6 files changed, 317 insertions, 104 deletions.

Extract format string cache from static field in Format.scala into a
pluggable FormatCache trait (analogous to ParseCache). This allows users
to supply custom cache implementations (e.g., Caffeine-based) via the
Interpreter/Evaluator constructors.

Key changes:
- New FormatCache trait with getOrElseUpdate API
- DefaultFormatCache: LRU LinkedHashMap (256 entries), thread-safe
- FormatCache.SharedDefault singleton preserves process-wide sharing
- FormatCache.EmptyCache for testing
- CompiledFormat sealed trait for type-safe opaque cache entries
- RuntimeFormat: direct Val dispatch, Long fast path, pre-cached specs
- PartialApplyFmt pre-parses at construction time (no cache needed)
- FormatCache threaded through Interpreter → Evaluator constructors

Upstream: he-pin/sjsonnet jit branch (format optimization commits)
@He-Pin He-Pin force-pushed the perf/format-runtime-optimization branch from 9ffd530 to c0bc815 Compare April 5, 2026 11:06
@He-Pin He-Pin changed the title perf: optimize Format runtime with cache, direct Val dispatch, and Long fast path perf: abstract FormatCache as pluggable trait, optimize format runtime Apr 5, 2026
@He-Pin He-Pin marked this pull request as ready for review April 5, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant