perf: foldl string concat O(n) StringBuilder optimization#665
Open
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Open
perf: foldl string concat O(n) StringBuilder optimization#665He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin wants to merge 1 commit intodatabricks:masterfrom
Conversation
6644fcb to
b32a959
Compare
Detect the pattern std.foldl(function(acc, elem) acc + elem, arr, stringInit) at runtime by inspecting the function's body AST. When the body is a BinaryOp(OP_+, ValidId(param0), ValidId(param1)) and the initial value is a string, use a StringBuilder for O(n) instead of O(n²) concatenation. Changes: - Val.Func: add bodyExpr hook (default null) for AST inspection - Evaluator.visitMethod: override bodyExpr to expose function body - ArrayModule.Foldl: add tryStringBuilderFoldl fast path that detects the concatenation pattern and builds the result in a single pass This addresses the 88x gap with jrsonnet on the foldl string concat benchmark by converting from quadratic string copying to linear StringBuilder appending. Upstream: jit branch commit 2d3e56d
b32a959 to
626d1ea
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
std.foldlwith string concatenation has O(n²) complexity because each+creates a new string. This detects the foldl+string-concat pattern and uses a StringBuilder for O(n) complexity.Key Design Decision
Detection at runtime in the foldl hot loop rather than static analysis. When foldl accumulator is a string and the function body performs string concatenation, we switch to a StringBuilder-based fast path that avoids quadratic string copying.
Modification
Added
tryStringBuilderFoldlinArrayModule.scalathat detects the string concat pattern and uses StringBuilder. Falls back to standard foldl for non-string cases.Benchmark Results
JMH Regression Suite (1 fork, 3 warmup, 1 measurement)
All other benchmarks within noise margin.
Scala Native Hyperfine (
-N -w4 -m20)Analysis
This is the single largest performance improvement in the entire optimization suite. The O(n²) → O(n) complexity change produces dramatic speedups that scale with input size. On native, this makes sjsonnet nearly 80x faster than jrsonnet (Rust) for this workload.
References
Upstream jit branch exploration at he-pin/sjsonnet@jit
Result
Massive reduction in foldl string concatenation time. bench.04 goes from 507ms to 6.9ms on native (73.9x speedup). sjsonnet is now 79.9x faster than jrsonnet for this benchmark.