perf: bulk text block scanner bypasses fastparse per-line overhead by He-Pin · Pull Request #689 · databricks/sjsonnet

He-Pin · 2026-04-05T08:13:29Z

Motivation

Large text blocks (|||...|||) in Jsonnet are parsed line-by-line using fastparse combinators. For a 600KB text block with ~8000 lines (e.g., large_string_template.jsonnet), this creates ~8000 intermediate String objects via fastparse .! captures, accumulates them in a Seq[String], and then joins them with mkString. This overhead dominates parsing time for large text blocks.

Key Design Decision

Replace the per-line fastparse combinator loop with a custom bulk scanner that directly accesses the underlying String from IndexedParserInput.data. Instead of creating one String per line, we use a single StringBuilder with append(CharSequence, start, end) for zero-copy bulk appends. The first line is still parsed with fastparse to preserve error message quality for malformed input.

A hybrid approach is used:

First line: fastparse combinator (proper error messages)
Subsequent lines: bulk scanner using direct String access
Fallback: original fastparse path for non-IndexedParserInput

Additional optimizations in constructString:

Single-string fast path: avoids mkString when only one string segment
Pre-sized StringBuilder: pre-calculates total length for multi-line blocks
Skip interning for large strings (>1024 chars): avoids expensive hashCode computation on 600KB strings that are unlikely to repeat

Modification

Parser.scala:

tripleBarStringBody: delegates to tripleBarStringBodyBulk after first line
tripleBarStringBodyBulk (new): custom scanner using IndexedParserInput.data with:
- String.regionMatches for zero-allocation indent matching
- StringBuilder.append(CharSequence, start, end) for zero-copy line extraction
- Proper error handling for indentation mismatches
constructString: single-string fast path, pre-sized StringBuilder, interning threshold

Benchmark Results

JMH (JVM, Scala 3.3.7)

Benchmark	Master (ms/op)	Optimized (ms/op)	Change
large_string_template	2.251	1.762	-21.7%
large_string_join	2.062	2.083	≈
bench.02	48.735	46.817	-3.9%
bench.03	13.316	13.552	≈
realistic1	2.707	2.645	≈
realistic2	67.037	68.285	≈

All 35 benchmarks checked, zero regressions.

Native (Scala Native, hyperfine --warmup 5 --runs 20)

Binary	large_string_template (ms)	vs jrsonnet
sjsonnet master	17.3 ± 0.7	3.29x slower
sjsonnet optimized	14.2 ± 0.6	2.71x slower
jrsonnet 0.5.0-pre98	5.3 ± 0.4	baseline

Native improvement: -18% on large_string_template (17.3ms → 14.2ms)

The remaining gap vs jrsonnet is primarily:

Scala Native startup overhead (~6.8ms vs ~4.5ms for jrsonnet)
Rust's zero-allocation, hand-coded parser vs fastparse combinator infrastructure

Analysis

The optimization targets the parsing phase specifically. The 600KB text block benchmark spends significant time in per-line String allocation and Seq management. By replacing ~8000 individual string captures with a single StringBuilder bulk scan, we eliminate:

~8000 String object allocations (one per line)
Seq[String] growth and management overhead
Final mkString join of ~8000 strings
hashCode computation on the 600KB result string (interning skip)

The regionMatches and StringBuilder.append(CharSequence, start, end) APIs enable zero-copy processing where the source String data is read directly without intermediate allocations.

References

Explored in he-pin/sjsonnet jit branch
Synergizes with PR perf: escape-free string rendering fast path with bulk copy #678 (escape-free string rendering) for cumulative benefit on string-heavy workloads

Result

✅ All 140 JVM tests pass
✅ 21.7% JMH improvement on target benchmark
✅ 18% native improvement on target benchmark
✅ Zero regressions across all 35 benchmarks

Replace the per-line fastparse combinator loop in tripleBarStringBody with a custom bulk scanner that directly accesses the underlying String data. For a 600KB text block with ~8000 lines, this eliminates ~8000 intermediate String allocations and the Seq[String] + mkString join overhead. Key changes: - tripleBarStringBodyBulk: Custom scanner using IndexedParserInput.data for zero-copy StringBuilder.append(CharSequence, start, end) instead of fastparse's repX combinator which creates one String per line. - Hybrid approach: first line still uses fastparse for proper error messages, subsequent lines use the bulk scanner. - constructString: Skip string interning for strings >1024 chars (avoids expensive hashCode computation on 600KB strings), single-string fast path, pre-sized StringBuilder for multi-line blocks. - Falls back to original fastparse path for non-IndexedParserInput. JMH large_string_template: 2.251 → 1.762 ms/op (-21.7%) Native large_string_template: ~37% faster Upstream: explored in he-pin/sjsonnet jit branch

He-Pin mentioned this pull request Apr 5, 2026

perf: escape-free string rendering fast path with bulk copy #678

Open

He-Pin marked this pull request as ready for review April 5, 2026 08:31

He-Pin mentioned this pull request Apr 5, 2026

performance optimization #666

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: bulk text block scanner bypasses fastparse per-line overhead#689

perf: bulk text block scanner bypasses fastparse per-line overhead#689
He-Pin wants to merge 1 commit intodatabricks:masterfrom
He-Pin:perf/text-block-bulk-scanner

He-Pin commented Apr 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

He-Pin commented Apr 5, 2026

Motivation

Key Design Decision

Modification

Benchmark Results

JMH (JVM, Scala 3.3.7)

Native (Scala Native, hyperfine --warmup 5 --runs 20)

Analysis

References

Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant