feat(benchmarking): adding ERC20 benchmarking test#3114
feat(benchmarking): adding ERC20 benchmarking test#3114
Conversation
…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)
go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.
go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.
go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.
The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.
…nfig collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes. Replace with direct header iteration over [startBlock, endBlock] and add Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and tx/block p50/p99. Optimize spamoor configuration for 100ms block time: - --slot-duration 100ms, --startup-delay 0 on daemon - throughput=50 per 100ms slot (500 tx/s per spammer) - max_pending=50000 to avoid 3s block poll backpressure - 5 staggered spammers with 50K txs each Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).
- Move startBlock capture after spammer creation to exclude warm-up - Replace 20s drain sleep with smart poll (waitForDrain) - Add deleteAllSpammers cleanup to handle stale spamoor DB entries - Lower trace sample rate to 10% to prevent Jaeger OOM
- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go
# Conflicts: # scripts/test.mk
# Conflicts: # test/e2e/benchmark/suite_test.go
# Conflicts: # test/e2e/benchmark/suite_test.go
move EV_RETH_TAG resolution and rpc connection limits into setupEnv so all benchmark tests share the same reth configuration. lower ERC20 spammer count from 5 to 2 to reduce resource contention on local hardware while keeping the loop for easy scaling on dedicated infra.
- add blockMetricsSummary with summarize(), log(), and entries() methods - add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead - add collectTraces() suite method to deduplicate trace collection pattern - add addEntries() convenience method on resultWriter - slim TestERC20Throughput from ~217 to ~119 lines - reuse collectTraces in TestSpamoorSmoke
📝 WalkthroughWalkthroughAdds an ERC20 throughput benchmark and CI job, plus e2e benchmarking utilities for block-level metrics, throughput/percentile calculations, spamoor lifecycle control, trace collection (ev-node and optional ev-reth), and related test/suite updates. Changes
Sequence DiagramsequenceDiagram
participant Test as Test/ERC20 Benchmark
participant Spamoor as Spamoor
participant Chain as EVM Chain
participant Metrics as Metrics Collector
participant Traces as Trace Collector
Test->>Spamoor: configure & launch spammers
Test->>Chain: record start block header
Spamoor->>Chain: submit ERC20 transfer txs
Test->>Spamoor: waitForSpamoorDone (monitor sent/failed)
Test->>Chain: waitForDrain (observe empty blocks)
Test->>Chain: record end block header
Test->>Metrics: collectBlockMetrics(start,end)
Metrics->>Chain: iterate blocks, gather gas/tx/intervals
Metrics-->>Test: aggregated stats (MGas/s, p50/p99, non-empty ratio)
Test->>Traces: collectTraces(ev-node, ev-reth)
Traces-->>Test: spans & ev-node overhead
Test->>Test: log summary, write benchmark entries, cleanup spammers
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| cosmossdk.io/math v1.5.3 | ||
| github.com/celestiaorg/go-square/v3 v3.0.2 | ||
| github.com/celestiaorg/tastora v0.16.0 | ||
| github.com/celestiaorg/tastora v0.16.1-0.20260302131806-2816c7b82bfb |
There was a problem hiding this comment.
working off a feature branch in tastora, we can add things to as we go instead of create a lot of separate releases for incrementally adding features.
| // ev-reth reads OTEL_EXPORTER_OTLP_ENDPOINT and passes it directly | ||
| // to with_endpoint(). opentelemetry-otlp v0.31 HTTP exporter does | ||
| // not auto-append /v1/traces, so the full path is required. | ||
| "OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces", |
There was a problem hiding this comment.
eventually this will be VictoriaTraces and not a local jaeger deployment.
| // different tests are distinguishable in the same output file. | ||
| func (s *blockMetricsSummary) entries(prefix string) []entry { | ||
| return []entry{ | ||
| {Name: prefix + " - MGas/s", Unit: "MGas/s", Value: s.AchievedMGas}, |
There was a problem hiding this comment.
initially just dumping a bunch of stats, we can see what these look like when running against a real deployment to establish a baseline.
|
The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).
|
|
Claude finished @chatton's task in 1m 42s —— View job Code ReviewOverall this is a well-structured addition. The measurement window design is thoughtful (excluding warm-up/cool-down from steady-state), the documentation is thorough, and prior feedback from CodeRabbit has been addressed. A few items still worth addressing: Issues1. Drain timeout is silently swallowed — metrics will be non-deterministic [Major]
if err := waitForDrain(drainCtx, t.Logf, e.ethClient, 10); err != nil {
t.Logf("warning: %v", err)
}The 30-second drain timeout will fire regularly in CI (block times + 10 consecutive empty blocks can easily exceed 30 s under load). When it does, s.Require().NoError(
waitForDrain(drainCtx, t.Logf, e.ethClient, 10),
"mempool did not drain in time",
)If a soft failure is intentional (i.e. "report whatever we have"), the drain timeout should be made large enough (e.g. 2–3 minutes) to be confident the data is complete, and the warning should clearly mark the result as partial. 2.
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3114 +/- ##
==========================================
- Coverage 60.93% 60.76% -0.18%
==========================================
Files 113 113
Lines 11617 11672 +55
==========================================
+ Hits 7079 7092 +13
- Misses 3739 3777 +38
- Partials 799 803 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.
| Benchmark suite | Current: af2aebd | Previous: e877782 | Ratio |
|---|---|---|---|
SpamoorSmoke - BlockExecutor.ApplyBlock (avg) |
70479 us |
7513.846385542169 us |
9.38 |
SpamoorSmoke - BlockExecutor.ProduceBlock (avg) |
71381.01960784313 us |
8378.484939759037 us |
8.52 |
SpamoorSmoke - DA.Submit (avg) |
1358.7 us |
678.9887005649717 us |
2.00 |
SpamoorSmoke - DASubmitter.SubmitData (avg) |
14905 us |
1011.6774193548387 us |
14.73 |
SpamoorSmoke - Engine.GetPayload (avg) |
30896.901960784315 us |
1659.436746987952 us |
18.62 |
SpamoorSmoke - Engine.NewPayload (avg) |
36382.01960784314 us |
2473.5873493975905 us |
14.71 |
SpamoorSmoke - Executor.ExecuteTxs (avg) |
70465.6862745098 us |
7501.469879518072 us |
9.39 |
SpamoorSmoke - Executor.GetTxs (avg) |
3161 us |
1766.111111111111 us |
1.79 |
SpamoorSmoke - TxPool.GetTxs (avg) |
3111.5 us |
1746.75 us |
1.78 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
.github/workflows/benchmark.yml (1)
34-34:⚠️ Potential issue | 🟠 MajorPin
extractions/setup-justto an immutable commit SHA in all jobs.Using
@v3is mutable and can drift unexpectedly across runs. Pin to a 40-char SHA for reproducibility and supply-chain safety.#!/bin/bash # List workflow action refs that are not pinned to a 40-char commit SHA. rg -nP '^\s*-\s*uses:\s*[^@]+@([^\s#]+)' .github/workflows \ | gawk '!/@[0-9a-f]{40}([[:space:]]|$)/ {print}'Also applies to: 66-66, 94-94
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml at line 34, Replace the mutable action ref "extractions/setup-just@v3" with an immutable 40-character commit SHA (e.g., extractions/setup-just@<40-char-sha>) so the workflow is pinned; update every occurrence of that action (the other instances flagged in the workflow) to use the same full commit SHA and verify no other actions in the workflow use short tags like `@v3`, `@v2`, or `@main`—ensure the new refs are 40 hex chars to guarantee reproducibility and supply-chain safety.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 201-214: Change waitForDrain to return an error instead of only
logging on timeout: update its signature (waitForDrain(ctx context.Context, log
func(string, ...any), client *ethclient.Client, consecutiveEmpty int) error),
replace the ctx.Done() branch to return a descriptive error (including emptyRun
and consecutiveEmpty) rather than calling log and returning, and update all call
sites (e.g., the test call in spamoor_erc20_test.go) to assert/handle the
returned error (for example using s.Require().NoError(waitForDrain(...))). Also
apply the same pattern to the other similar timeout branch noted in the file so
all timeouts propagate errors to callers.
- Around line 290-294: The AchievedTPS calculation in the blockMetricsSummary
return unconditionally divides by ss.Seconds(), which can be zero; update the
code that builds the blockMetricsSummary so you compute seconds := ss.Seconds()
(or similar), check if seconds == 0 and set AchievedTPS = 0 (or a safe default)
when zero, otherwise set AchievedTPS = float64(m.TotalTxCount) / seconds; modify
the return to use that guarded value (reference blockMetricsSummary,
AchievedTPS, ss.Seconds(), and m.TotalTxCount).
In `@test/e2e/benchmark/suite_test.go`:
- Around line 55-63: The current rethTag() function falls back to the hardcoded
defaultRethTag ("pr-140"), which is unsafe for CI; update the implementation so
it either (A) replaces defaultRethTag with an immutable stable release tag or
digest (e.g., a real release string) and use that as the fallback, or (B) remove
the fallback and make rethTag() fail fast when EV_RETH_TAG is not set
(log.Fatalf or return an error) to force CI configuration; change the symbol
defaultRethTag and the rethTag() function accordingly and ensure callers handle
the fatal/error behavior if you choose the fail-fast option.
---
Duplicate comments:
In @.github/workflows/benchmark.yml:
- Line 34: Replace the mutable action ref "extractions/setup-just@v3" with an
immutable 40-character commit SHA (e.g., extractions/setup-just@<40-char-sha>)
so the workflow is pinned; update every occurrence of that action (the other
instances flagged in the workflow) to use the same full commit SHA and verify no
other actions in the workflow use short tags like `@v3`, `@v2`, or `@main`—ensure the
new refs are 40 hex chars to guarantee reproducibility and supply-chain safety.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
test/e2e/go.sumis excluded by!**/*.sum
📒 Files selected for processing (8)
.github/workflows/benchmark.ymltest/e2e/benchmark/helpers.gotest/e2e/benchmark/output.gotest/e2e/benchmark/spamoor_erc20_test.gotest/e2e/benchmark/spamoor_smoke_test.gotest/e2e/benchmark/suite_test.gotest/e2e/benchmark/traces.gotest/e2e/go.mod
| // TODO: temporary hardcoded tag, will be replaced with a proper release tag | ||
| const defaultRethTag = "pr-140" | ||
|
|
||
| func rethTag() string { | ||
| if tag := os.Getenv("EV_RETH_TAG"); tag != "" { | ||
| return tag | ||
| } | ||
| return defaultRethTag | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for EV_RETH_TAG usage across the codebase
rg -n 'EV_RETH_TAG|defaultRethTag|rethTag\(\)|WithTag\(rethTag' --type-list | head -20Repository: evstack/ev-node
Length of output: 448
🏁 Script executed:
# Search for CI/workflow files that might set EV_RETH_TAG
find . -type f \( -name "*.yml" -o -name "*.yaml" \) | head -20Repository: evstack/ev-node
Length of output: 759
🏁 Script executed:
# Check the actual test file and surrounding context
cat -n test/e2e/benchmark/suite_test.go | head -80Repository: evstack/ev-node
Length of output: 2877
🏁 Script executed:
# Search for any references to pr-140 in the codebase
rg -n 'pr-140|pr-\d+' --type-listRepository: evstack/ev-node
Length of output: 448
🏁 Script executed:
# Search for EV_RETH_TAG in workflows
rg 'EV_RETH_TAG' .github/workflows/Repository: evstack/ev-node
Length of output: 41
🏁 Script executed:
# Search for rethTag usage in the codebase
rg 'rethTag|WithTag'Repository: evstack/ev-node
Length of output: 173
🏁 Script executed:
# Check the benchmark workflow specifically
cat -n .github/workflows/benchmark.ymlRepository: evstack/ev-node
Length of output: 8038
🏁 Script executed:
# Search for any references to pr-140 or similar tag patterns
rg 'pr-140|pr-\d+|reth.*tag'Repository: evstack/ev-node
Length of output: 123
Replace hardcoded reth tag with immutable release tag or require explicit CI configuration.
The fallback to "pr-140" is the only active code path in CI—EV_RETH_TAG is never configured in .github/workflows/benchmark.yml (lines 69–73). This makes benchmark runs non-deterministic and fragile: if that PR-scoped tag is removed or retargeted, all benchmarks fail silently or use unintended image versions. Either use a stable release tag/digest, or explicitly require EV_RETH_TAG to be set (fail fast if unset) to enforce reproducibility and match the test file determinism guideline.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/e2e/benchmark/suite_test.go` around lines 55 - 63, The current rethTag()
function falls back to the hardcoded defaultRethTag ("pr-140"), which is unsafe
for CI; update the implementation so it either (A) replaces defaultRethTag with
an immutable stable release tag or digest (e.g., a real release string) and use
that as the fallback, or (B) remove the fallback and make rethTag() fail fast
when EV_RETH_TAG is not set (log.Fatalf or return an error) to force CI
configuration; change the symbol defaultRethTag and the rethTag() function
accordingly and ensure callers handle the fatal/error behavior if you choose the
fail-fast option.
There was a problem hiding this comment.
will be addressed in a follow up when there is a proper tag to use.
There was a problem hiding this comment.
@chatton, sounds good! Since you'll be addressing this in a follow-up, you might want to track it with a GitHub issue or add a TODO comment in the code pointing to the future work—it'll make it easier to find when the proper release tag is available. 👍
🧠 Learnings used
Learnt from: CR
Repo: evstack/ev-node PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-02-26T20:31:15.823Z
Learning: Applies to **/*_test.go : Ensure tests are deterministic
- waitForDrain returns an error on timeout instead of silently logging - guard AchievedTPS computation when steady-state duration is zero
There was a problem hiding this comment.
♻️ Duplicate comments (1)
test/e2e/benchmark/spamoor_erc20_test.go (1)
77-79:⚠️ Potential issue | 🟠 MajorFail the benchmark when mempool drain times out.
Continuing after
waitForDraintimeout can undercount processed txs/gas while still passing, which makes reported throughput non-deterministic. Uses.Require().NoError(err, "mempool did not drain in time")here instead of warning-only logging.As per coding guidelines,
**/*_test.go: Ensure tests are deterministic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/spamoor_erc20_test.go` around lines 77 - 79, The benchmark currently only logs a warning when waitForDrain(drainCtx, t.Logf, e.ethClient, 10) returns an error, which can allow timeouts to undercount throughput; replace the warning-only handling with a hard test failure by calling s.Require().NoError(err, "mempool did not drain in time") (use the same err from waitForDrain) so the benchmark fails deterministically on drain timeout; locate the call to waitForDrain and swap the t.Logf branch to invoke s.Require().NoError(err, "mempool did not drain in time") referencing waitForDrain, drainCtx, e.ethClient and err.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/e2e/benchmark/spamoor_erc20_test.go`:
- Around line 77-79: The benchmark currently only logs a warning when
waitForDrain(drainCtx, t.Logf, e.ethClient, 10) returns an error, which can
allow timeouts to undercount throughput; replace the warning-only handling with
a hard test failure by calling s.Require().NoError(err, "mempool did not drain
in time") (use the same err from waitForDrain) so the benchmark fails
deterministically on drain timeout; locate the call to waitForDrain and swap the
t.Logf branch to invoke s.Require().NoError(err, "mempool did not drain in
time") referencing waitForDrain, drainCtx, e.ethClient and err.
There was a problem hiding this comment.
♻️ Duplicate comments (1)
.github/workflows/benchmark.yml (1)
94-94:⚠️ Potential issue | 🟠 MajorPin
extractions/setup-justto an immutable commit SHA.Line [94] uses a mutable tag (
@v3) for a third-party action, which weakens supply-chain integrity.🔒 Suggested fix
- name: Install just - uses: extractions/setup-just@v3 + uses: extractions/setup-just@<full-40-char-commit-sha> # v3#!/bin/bash # Verify mutable vs pinned refs for setup-just in this workflow. rg -nP 'uses:\s*extractions/setup-just@v[0-9]+(\.[0-9]+)?(\.[0-9]+)?\b' .github/workflows/benchmark.yml rg -nP 'uses:\s*extractions/setup-just@[0-9a-f]{40}\b' .github/workflows/benchmark.yml # Expected after fix: # - First command: no matches # - Second command: one match per setup-just usage🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml at line 94, Replace the mutable action reference "uses: extractions/setup-just@v3" with a pinned immutable commit SHA; locate the workflow line that contains uses: extractions/setup-just@v3 and change it to uses: extractions/setup-just@<commit-sha> (the full 40-character commit hash from the action repo) so the workflow references a specific immutable revision instead of the v3 tag.
🧹 Nitpick comments (1)
.github/workflows/benchmark.yml (1)
88-88: Alignactions/setup-goversion inerc20-benchmarkjob with other benchmark jobs.The
erc20-benchmarkjob uses v6.2.0 whileevm-benchmarkandspamoor-benchmarkuse v6.3.0, creating inconsistency across benchmark tooling.♻️ Suggested fix
- uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0 + uses: actions/setup-go@4b73464bb391d4059bd26b0524d20df3927bd417 # v6.3.0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/benchmark.yml at line 88, Update the actions/setup-go step in the erc20-benchmark job to use the same version as the other benchmark jobs: replace the pinned ref "actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5" (v6.2.0) with the ref used by evm-benchmark and spamoor-benchmark (v6.3.0) so the setup-go action version is consistent across benchmark jobs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In @.github/workflows/benchmark.yml:
- Line 94: Replace the mutable action reference "uses:
extractions/setup-just@v3" with a pinned immutable commit SHA; locate the
workflow line that contains uses: extractions/setup-just@v3 and change it to
uses: extractions/setup-just@<commit-sha> (the full 40-character commit hash
from the action repo) so the workflow references a specific immutable revision
instead of the v3 tag.
---
Nitpick comments:
In @.github/workflows/benchmark.yml:
- Line 88: Update the actions/setup-go step in the erc20-benchmark job to use
the same version as the other benchmark jobs: replace the pinned ref
"actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5" (v6.2.0) with the
ref used by evm-benchmark and spamoor-benchmark (v6.3.0) so the setup-go action
version is consistent across benchmark jobs.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
test/e2e/go.sumis excluded by!**/*.sum
📒 Files selected for processing (2)
.github/workflows/benchmark.ymltest/e2e/go.mod
🚧 Files skipped from review as they are similar to previous changes (1)
- test/e2e/go.mod
Overview
Closes #3122
Adds
TestERC20Throughputto measure ERC-20 token transfer throughput using spamoor for load generation. Reports MGas/s, TPS, and per-span latency breakdown.Summary by CodeRabbit
Tests
Chores