Skip to content

perf: custom loader (up to 70x+ improvements)#3

Open
cmilesdev wants to merge 79 commits intomainfrom
cmilesdev/custom-loader
Open

perf: custom loader (up to 70x+ improvements)#3
cmilesdev wants to merge 79 commits intomainfrom
cmilesdev/custom-loader

Conversation

@cmilesdev
Copy link
Copy Markdown
Member

@cmilesdev cmilesdev commented Mar 16, 2026

This PR overhauls the loader process for Wire which has significant performance issues under various circumstances. Most of which were tied to go/packages semantics and go/types type checking work. With larger codebases, you incur a growing cost of both as a baseline. For example, in a project that stock Google Wire would take 1.3s to run, 400-600ms of that was go list deps and the rest was type checking all external and internal packages recursively in order to do what Wire needs to do. This PR replaces the loader process, caches the heavy work and invalidates it when necessary.

Safety

Safety, accuracy is important above all else. Existing Wire tests pass along with a plethora of newly added scenario based testing.

Test it Yourself

You can test the new loader yourself by installing this branch:

go install github.com/goforj/wire/cmd/wire@cf52879

Revert back to goforj

go install github.com/goforj/wire/cmd/wire@latest

Or to stock google wire

go install github.com/google/wire/cmd/wire@latest

Custom Loader

This custom loader implementation is far more consistent than previous cache attempts (on goforj/wire) where those cache attempts were tackling the issue further down the line when they needed to be addressed more upstream in the loader. While this was helpful, you'd lose your compile speed under certain edit types. This implementation provides consistent lightning fast compile times across the board in many scenarios. See table below for measurements.

The table below illustrates up to 70x+ speed improvements, but these improvements scale with codebase size and complexity so you can maintain a great development experience even under massive repositories.

Benchmarks

+---------------+-------+--------+----------+----------------------+----------+----------+---------+
|       profile | local | stdlib | external | change type          | stock    | current  | speedup |
+---------------+-------+--------+----------+----------------------+----------+----------+---------+
|         local | 41    | 191    | 1        | cold run             | 350.6ms  | 2976.0ms | 0.12x   |
|         local | 41    | 191    | 1        | unchanged rerun      | 339.6ms  | 9.4ms    | 36.18x  |
|         local | 41    | 191    | 1        | body-only local edit | 328.5ms  | 27.2ms   | 12.09x  |
|         local | 41    | 191    | 1        | shape change         | 325.9ms  | 146.7ms  | 2.22x   |
|         local | 41    | 191    | 1        | import change        | 327.6ms  | 147.6ms  | 2.22x   |
|         local | 41    | 191    | 1        | known import toggle  | 327.9ms  | 145.2ms  | 2.26x   |
|    local-high | 1016  | 191    | 1        | cold run             | 759.4ms  | 5421.0ms | 0.14x   |
|    local-high | 1016  | 191    | 1        | unchanged rerun      | 605.5ms  | 78.6ms   | 7.71x   |
|    local-high | 1016  | 191    | 1        | body-only local edit | 604.8ms  | 150.4ms  | 4.02x   |
|    local-high | 1016  | 191    | 1        | shape change         | 601.8ms  | 674.6ms  | 0.89x   |
|    local-high | 1016  | 191    | 1        | import change        | 602.4ms  | 688.7ms  | 0.87x   |
|    local-high | 1016  | 191    | 1        | known import toggle  | 601.7ms  | 675.6ms  | 0.89x   |
|  external-low | 42    | 243    | 342      | cold run             | 1490.8ms | 7499.4ms | 0.20x   |
|  external-low | 42    | 243    | 342      | unchanged rerun      | 1198.1ms | 16.2ms   | 73.98x  |
|  external-low | 42    | 243    | 342      | body-only local edit | 1113.6ms | 81.4ms   | 13.68x  |
|  external-low | 42    | 243    | 342      | shape change         | 1208.2ms | 405.4ms  | 2.98x   |
|  external-low | 42    | 243    | 342      | import change        | 1186.3ms | 421.1ms  | 2.82x   |
|  external-low | 42    | 243    | 342      | known import toggle  | 1056.9ms | 431.2ms  | 2.45x   |
| external-high | 117   | 243    | 342      | cold run             | 1448.5ms | 7643.8ms | 0.19x   |
| external-high | 117   | 243    | 342      | unchanged rerun      | 1132.0ms | 22.5ms   | 50.37x  |
| external-high | 117   | 243    | 342      | body-only local edit | 1167.2ms | 91.6ms   | 12.75x  |
| external-high | 117   | 243    | 342      | shape change         | 1224.0ms | 483.1ms  | 2.53x   |
| external-high | 117   | 243    | 342      | import change        | 1286.2ms | 467.1ms  | 2.75x   |
| external-high | 117   | 243    | 342      | known import toggle  | 1268.8ms | 468.5ms  | 2.71x   |
+---------------+-------+--------+----------+----------------------+----------+----------+---------+
  • Stock: is google/wire, not goforj/wire
  • cold run: first wire gen
  • unchanged rerun: run wire gen again without changing any files.
  • body-only local edit: change only function body/content in a local Go file, without changing imports, types, or constructor signatures.
  • shape change: change local type/provider shape, like constructor params, fields, or return shape, while staying within the same general dependency graph.
  • import change: add or remove an import in a local package, which can change the discovered package graph and cached shape.
  • known import toggle: switch back to a previously seen import/shape state in the same repo, so the loader can potentially reuse an already-known cached graph.

Implementation Details

This PR replaces the primary go/packages path with a custom loader that is much more intentional about what work it does and when it does it.

  • Discover the package graph for the requested roots
  • Load typed package state from that discovered graph
  • Cache both discovery state and typed artifacts aggressively
  • Fall back cleanly when the custom path cannot safely satisfy the request

There are two different kinds of work:

  • Discovery work tells us what packages, files, and imports are involved
  • Typed work is the expensive parse, typecheck, and package stitching work

By separating those layers we can skip far more repeated work than before.

Loader

Most of the new implementation lives in internal/loader.

  • custom.go is the main custom loading backend
  • discovery.go and discovery_cache.go build and reuse the discovered graph
  • artifact_cache.go stores typed package artifacts for reuse between runs
  • fallback.go preserves a safe fallback path when needed
  • timing.go exposes timings so we can see where time is actually going

Caching

Caching is now a first class part of generation instead of a thin optimization layered on afterward.

  • Discovery cache stores the discovered package graph
  • Loader artifacts store typed package summaries for local and external packages
  • Cache continues to support parser level reuse
  • Output cache remains available for generated output reuse

This is what enables the very fast unchanged rerun and body-only local edit paths in the benchmark table.

  • External packages are heavily reused after the first run
  • Local packages only pay for the work required by the current edit shape
  • Previously seen states can often reuse old cache state instead of rediscovering everything

Parser / Wire Integration

The parser and provider set handling needed to be updated to operate cleanly with cached semantic state.

  • internal/wire/parse.go now reconstructs provider information from semantic artifacts when safe
  • internal/wire/output_cache.go was updated around the new loader flow
  • internal/wire/wire.go now drives generation through the new loading backend
  • internal/wire/load_debug.go and internal/wire/loader_timing_bridge.go make the new path observable via -timings flag

One important detail here is that the system still prefers falling back cleanly over trusting cached reconstruction when it is not safe to do so.

CLI

The command surface was updated to route through the same generation model instead of splitting behavior between commands.

  • wire gen, wire check, wire diff, and wire show now run through the new backend
  • wire watch stays on the same core generation path rather than inventing a separate implementation
  • wire cache was expanded so cache inspection and clearing are easier
  • cmd/wire/main.go now wires cache and loader behavior more explicitly

Colorization There are now new colorization for errors (red) and success (green) to make things a little easier to read in between watcher tooling spam. Multiline errors are now presented more user friendly.

Compatibility / Safety

This is meant to be a conservative loader change, not a semantic rewrite of Wire. Everything works the same outside of some of the tweaks goforj/wire has made by introducing wire cache clear wire serve

  • Existing generation entrypoints stay intact
  • Fallback behavior exists when the custom path cannot safely proceed
  • Test coverage was added heavily around loader behavior, parser behavior, and command integration

That shows up in:

  • internal/loader/loader_test.go
  • internal/wire/wire_test.go
  • internal/wire/parse_coverage_test.go
  • cmd/wire/main_test.go

Benchmarks

The benchmark harness was also expanded so it measures concrete developer workflows instead of only raw repo scale. (Seen in the table above)

  • scripts/import-benchmarks.sh now prints both scale and scenario tables
  • internal/wire/import_bench_test.go now measures edit types like unchanged reruns, body edits, shape edits, and import toggles
  • scenario runs distinguish cold runs from warmed runs
  • benchmark output now reports real graph composition:
    • local packages
    • stdlib packages
    • external packages

There are now a few benchmark profiles:

  • local for a modest local graph
  • local-high for a very large local graph
  • external for a graph with a much heavier external dependency surface

@zzzz465
Copy link
Copy Markdown

zzzz465 commented Mar 27, 2026

I tried this version and it works fine in local environment, decreased wiring time from 20s to roughly 1s.
however, there's some issue that block using this from CI environment.

  1. mtime is changed when cache is stored/loaded in CI environment, cache is always considered stale.
  2. discovery-cache is hardcoded to UserCacheDir(). this can be resolved by caching whole cache dir, but it is good to have a granular control over cache directory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants