Execution optimizations and improvements by anubhavchaturvedi · Pull Request #179 · meta-pytorch/tlparse

anubhavchaturvedi · 2026-04-01T23:54:14Z

Performance and usability improvements for tlparse, yielding ~39% faster parsing and ~32% less memory on large logs.

Performance:

Eliminate double JSON parse per log line (parse Envelope only once)
Static regex compilation via Lazy statics
fs::copy for raw.log instead of loading entire file into memory
Pre-allocate buffers, reuse String across loop iterations, cache year computation
Avoid cloning Vec in compilation metrics path

Features:

Transparent gzip input support (.gz files decompress automatically)

Infrastructure:

Parse benchmark (benches/parse_benchmark.rs) measuring wall time and peak RSS
3 new integration tests for gzip support

PR is split into smaller commits, each having appropriate description of changes.

Here are the benchmark results before vs after on a 327MB file.

Adds a standalone benchmark (benches/parse_benchmark.rs) that measures: - Wall time statistics (mean/median/min/max) across configurable iterations - Peak RSS via getrusage (cold-run and post-warmup measurements) - Input file line count for context Usage: TLPARSE_BENCH_INPUT=/path/to/log cargo bench --bench parse_benchmark No production code changes. Dev-dependencies added: libc (RSS), tempfile (output dirs). - Removed hardcoded machine-specific path; requires explicit input - Added cold-run RSS measurement with documentation of ru_maxrss limitations - Streaming line count instead of loading entire file into memory - Write errors surfaced via expect() instead of silently swallowed

… year caching Four localized optimizations with zero API changes: 1. Pre-allocate HTML string in anchor_source (parsers.rs) - Remove intermediate Vec<&str> from lines().collect(), iterate directly - Pre-allocate output with String::with_capacity(text.len() * 2 + 500) 2. Pre-allocate shortraw_content buffer (lib.rs) - Use String::with_capacity(file_size / 8) (~12.5% of input size) - Avoids ~30 reallocations for large logs 3. Reuse payload String across parse loop iterations (lib.rs) - Hoist payload_buf before loop, clear() each iteration - Retains allocated capacity, avoiding millions of small allocations 4. Compute year once before parse loop (lib.rs) - Move chrono::Utc::now().year() before format_timestamp closure - Eliminates one clock_gettime syscall per log line Note: syntect lazy-init (SyntaxSet/ThemeSet) was already present in the codebase via OnceLock, no change needed.

…lone Three optimizations targeting the hottest paths: 1. Static regex compilation + CompileId helpers (types.rs) - Move RE_EVAL_WITH_KEY and RE_SEED_NSPID to module-level Lazy statics - Add normalize_attempt() for None->Some(0) migration - Add collapse_attempt() for unconditional attempt reset to 0 (used in compilation_metrics and metrics_index lookups) 2. Eliminate double JSON parse per log line (lib.rs) — HIGHEST IMPACT - Parse each line as Envelope only once (was: Value + Envelope) - Shortraw (raw.jsonl) output now built by parsing as Value separately, inserting glog metadata, and re-serializing with sorted keys - Substring-based key-conflict detection as early bail-out before parse - Net effect: ~50% reduction in JSON parsing for the main loop 3. Avoid Vec<OutputFile> clone in CompilationMetrics (lib.rs, parsers.rs) - Two-phase borrow pattern: immutable slice borrow for parse, then mutable access for result processing - Changed CompilationMetricsParser.output_files from &Vec to &[OutputFile] - Eliminates clone of entire output file list per metrics entry Output is byte-for-byte identical to baseline across all test logs.

Instead of loading the full input log into a String and passing it through ParseOutput, the CLI now copies raw.log directly via std::fs::copy(). For a 500MB log, this saves ~500MB+ of heap allocation (String + UTF-8 validated copy). fs::copy uses kernel-level zero-copy (sendfile/copy_file_range). Changes: - lib.rs: Removed fs::read_to_string(path) and raw.log ParseOutput entry - cli.rs: Added fs::copy(log_path, output_dir.join("raw.log")) after writing all ParseOutput entries Note: raw.log is not listed in the non-breaking contract as a guaranteed ParseOutput entry. Library callers using parse_path() directly will no longer find raw.log in the returned Vec and should copy the input file themselves if needed.

Detect .gz extension on input files and transparently decompress using flate2::read::GzDecoder. This is purely additive — existing .log files work identically. Changes: - lib.rs: Wrap file reader in GzDecoder when path ends in .gz, using Box<dyn io::Read> for unified handling - cli.rs: Copy as raw.log.gz (not raw.log) for gzip inputs - cli.rs: Accept .log.gz files in --all-ranks-html rank log discovery (tries .log.gz suffix before .log) - Cargo.toml: Add flate2 = "1.0" dependency Tests: 3 new integration tests covering library-level gzip parsing, CLI raw.log.gz copying, and all-ranks .log.gz discovery. Verified: gzip output is byte-for-byte identical to uncompressed baseline for all test logs.

yushangdi · 2026-04-02T00:46:57Z

Cargo.toml

can you also update the version here?

yushangdi · 2026-04-02T00:47:52Z

can you summarize the optimziations you made in this PR?

Performance improvements in this release: - ~39% faster parsing (median) on large logs - ~32% less memory usage - Transparent gzip input support (.gz files) - fs::copy for raw.log (avoids loading entire file into memory)

anubhavchaturvedi · 2026-04-02T01:09:00Z

Updated the summary, each commit also has the description of changes within it.

anubhavchaturvedi added 5 commits April 1, 2026 16:14

meta-cla bot added the cla signed label Apr 1, 2026

yushangdi reviewed Apr 2, 2026

View reviewed changes

Cargo.toml Outdated

Copy link
Copy Markdown

Contributor

yushangdi Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also update the version here?

anubhavchaturvedi reacted with thumbs up emoji

yushangdi requested a review from aorenste April 2, 2026 00:50

Bump version to 0.4.9

657269c

Performance improvements in this release: - ~39% faster parsing (median) on large logs - ~32% less memory usage - Transparent gzip input support (.gz files) - fs::copy for raw.log (avoids loading entire file into memory)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution optimizations and improvements#179

Execution optimizations and improvements#179
anubhavchaturvedi wants to merge 6 commits intometa-pytorch:mainfrom
anubhavchaturvedi:optimizations

anubhavchaturvedi commented Apr 1, 2026 •

edited

Loading

Uh oh!

yushangdi Apr 2, 2026

Uh oh!

yushangdi commented Apr 2, 2026

Uh oh!

anubhavchaturvedi commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anubhavchaturvedi commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yushangdi Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

yushangdi commented Apr 2, 2026

Uh oh!

anubhavchaturvedi commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anubhavchaturvedi commented Apr 1, 2026 •

edited

Loading