feat(benchmarking): adding ERC20 benchmarking test by chatton · Pull Request #3114 · evstack/ev-node

chatton · 2026-03-02T14:10:11Z

Overview

Adds TestERC20Throughput to measure ERC-20 token transfer throughput using spamoor for load generation. Reports MGas/s, TPS, and per-span latency breakdown.

Summary by CodeRabbit

Tests
- Added ERC20 throughput benchmark test measuring transaction/gas/block metrics and end-to-end throughput.
- Added benchmarking helpers for metric collection, summaries, trace aggregation, and batch result writing.
Chores
- Added dedicated CI job to run the ERC20 throughput benchmark.
- Streamlined test setup/teardown, improved trace collection, removed fragile span assertions, and updated a test dependency.

…mark - Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite) - Move spamoor smoke test into suite as TestSpamoorSmoke - Split helpers into focused files: traces.go, output.go, metrics.go - Introduce resultWriter for defer-based benchmark JSON output - Export shared symbols from evm_test_common.go for cross-package use - Restructure CI to fan-out benchmark jobs and fan-in publishing - Run benchmarks on PRs only when benchmark-related files change

Resolve conflicts keeping the benchmark suite refactoring: - benchmark.yml: keep path filters and suite-style test command - evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg) - evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON (now in benchmark/output.go)

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

…nfig collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes. Replace with direct header iteration over [startBlock, endBlock] and add Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and tx/block p50/p99. Optimize spamoor configuration for 100ms block time: - --slot-duration 100ms, --startup-delay 0 on daemon - throughput=50 per 100ms slot (500 tx/s per spammer) - max_pending=50000 to avoid 3s block poll backpressure - 5 staggered spammers with 50K txs each Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).

- Move startBlock capture after spammer creation to exclude warm-up - Replace 20s drain sleep with smart poll (waitForDrain) - Add deleteAllSpammers cleanup to handle stale spamoor DB entries - Lower trace sample rate to 10% to prevent Jaeger OOM

- make reth tag configurable via EV_RETH_TAG env var (default pr-140) - fix OTLP config: remove duplicate env vars, use http/protobuf protocol - use require.Eventually for host readiness polling - rename requireHTTP to requireHostUp - use non-fatal logging in resultWriter.flush deferred context - fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv) - rename loop variable to avoid shadowing testing.TB convention - add block/internal/executing/** to CI path trigger - remove unused require import from output.go

# Conflicts: # scripts/test.mk

# Conflicts: # test/e2e/benchmark/suite_test.go

move EV_RETH_TAG resolution and rpc connection limits into setupEnv so all benchmark tests share the same reth configuration. lower ERC20 spammer count from 5 to 2 to reduce resource contention on local hardware while keeping the loop for easy scaling on dedicated infra.

- add blockMetricsSummary with summarize(), log(), and entries() methods - add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead - add collectTraces() suite method to deduplicate trace collection pattern - add addEntries() convenience method on resultWriter - slim TestERC20Throughput from ~217 to ~119 lines - reuse collectTraces in TestSpamoorSmoke

coderabbitai · 2026-03-02T14:10:23Z

📝 Walkthrough

Walkthrough

Adds an ERC20 throughput benchmark and CI job, plus e2e benchmarking utilities for block-level metrics, throughput/percentile calculations, spamoor lifecycle control, trace collection (ev-node and optional ev-reth), and related test/suite updates.

Changes

Cohort / File(s)	Summary
CI Workflow `.github/workflows/benchmark.yml`	Added `erc20-benchmark` job to run the ERC20 throughput test (setup Go, Buildx, install just, build binaries, run Go test for ERC20 throughput with a 15m test timeout).
ERC20 Benchmark Tests & Helpers `test/e2e/benchmark/spamoor_erc20_test.go`, `test/e2e/benchmark/helpers.go`, `test/e2e/benchmark/output.go`	New TestERC20Throughput; large helpers for blockMetrics collection, percentile/stats, throughput (MGas/s), polling utilities (waitForSpamoorDone, waitForDrain), ev-node overhead analysis, and batch resultWriter.addEntries.
Suite & Trace Infrastructure `test/e2e/benchmark/suite_test.go`, `test/e2e/benchmark/traces.go`	Added rethTag resolution and default tag, traceResult type, collectTraces/tryCollectServiceTraces to gather ev-node and optional ev-reth traces; reduced sampling rate, adjusted node start args, and removed assertSpanNames.
Smoke Test Cleanup `test/e2e/benchmark/spamoor_smoke_test.go`	Removed env-based reth tag logic, added deleteAllSpammers pre-check, simplified trace collection to unified collectTraces flow.
Dependency Update `test/e2e/go.mod`	Bumped tastora dependency from v0.16.0 to v0.16.1-0.20260302131806-2816c7b82bfb.

Sequence Diagram

sequenceDiagram
    participant Test as Test/ERC20 Benchmark
    participant Spamoor as Spamoor
    participant Chain as EVM Chain
    participant Metrics as Metrics Collector
    participant Traces as Trace Collector

    Test->>Spamoor: configure & launch spammers
    Test->>Chain: record start block header
    Spamoor->>Chain: submit ERC20 transfer txs
    Test->>Spamoor: waitForSpamoorDone (monitor sent/failed)
    Test->>Chain: waitForDrain (observe empty blocks)
    Test->>Chain: record end block header
    Test->>Metrics: collectBlockMetrics(start,end)
    Metrics->>Chain: iterate blocks, gather gas/tx/intervals
    Metrics-->>Test: aggregated stats (MGas/s, p50/p99, non-empty ratio)
    Test->>Traces: collectTraces(ev-node, ev-reth)
    Traces-->>Test: spans & ev-node overhead
    Test->>Test: log summary, write benchmark entries, cleanup spammers

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

refactor: move spamoor benchmark into testify suite #3107: Overlaps suite-level benchmarking, metrics and spamoor test changes; touches same benchmark helpers and tests.
chore: gather spans from both ev-node and ev-reth in spamoor test #3099: Adds/extends trace collection for ev-node and ev-reth—closely related to collectTraces/tryCollectServiceTraces additions.
feat: adding spammoor test to benchmark #3105: Related CI benchmark job and spamoor benchmark scaffolding changes; overlaps workflow and output handling.

Suggested reviewers

alpe
randygrok

Poem

🐰 Hopping through blocks with a ledger bright,

ERC20s dancing into the night,
Spamoor hums and packets take wing,
Metrics bloom — MGas and p50 sing,
I nibble traces and watch throughput light.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 77.78% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat(benchmarking): adding ERC20 benchmarking test' clearly and specifically describes the main change: adding an ERC20 benchmarking test to the benchmarking suite.
Description check	✅ Passed	The description provides an overview with a linked issue reference (`#3122`) and explains the test adds TestERC20Throughput to measure ERC-20 token transfer throughput using spamoor and reports MGas/s, TPS, and per-span latency metrics.
Linked Issues check	✅ Passed	The PR implements all required objectives from issue `#3122`: adds TestERC20Throughput, uses spamoor for load generation, and reports MGas/s, TPS, and per-span latency breakdown metrics.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the ERC20 benchmark feature: new benchmark job in CI, test implementation, helper utilities, and minor cleanup of related benchmark infrastructure are all aligned with issue `#3122` objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch cian/erc20-benchmark

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatton · 2026-03-02T14:10:55Z

test/e2e/go.mod

 	cosmossdk.io/math v1.5.3
 	github.com/celestiaorg/go-square/v3 v3.0.2
-	github.com/celestiaorg/tastora v0.16.0
+	github.com/celestiaorg/tastora v0.16.1-0.20260302131806-2816c7b82bfb


working off a feature branch in tastora, we can add things to as we go instead of create a lot of separate releases for incrementally adding features.

chatton · 2026-03-02T14:11:17Z

test/e2e/benchmark/suite_test.go

+					// ev-reth reads OTEL_EXPORTER_OTLP_ENDPOINT and passes it directly
+					// to with_endpoint(). opentelemetry-otlp v0.31 HTTP exporter does
+					// not auto-append /v1/traces, so the full path is required.
+					"OTEL_EXPORTER_OTLP_ENDPOINT="+jg.Internal.IngestHTTPEndpoint()+"/v1/traces",


eventually this will be VictoriaTraces and not a local jaeger deployment.

chatton · 2026-03-02T14:12:04Z

test/e2e/benchmark/helpers.go

+// different tests are distinguishable in the same output file.
+func (s *blockMetricsSummary) entries(prefix string) []entry {
+	return []entry{
+		{Name: prefix + " - MGas/s", Unit: "MGas/s", Value: s.AchievedMGas},


initially just dumping a bunch of stats, we can see what these look like when running against a real deployment to establish a baseline.

github-actions · 2026-03-02T14:12:16Z

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

Build	Format	Lint	Breaking	Updated (UTC)
`✅ passed`	`⏩ skipped`	`✅ passed`	`✅ passed`	Mar 3, 2026, 10:41 AM

claude · 2026-03-02T14:12:44Z

Claude finished @chatton's task in 1m 42s —— View job

Code Review

Overall this is a well-structured addition. The measurement window design is thoughtful (excluding warm-up/cool-down from steady-state), the documentation is thorough, and prior feedback from CodeRabbit has been addressed. A few items still worth addressing:

Issues

1. Drain timeout is silently swallowed — metrics will be non-deterministic [Major]

test/e2e/benchmark/spamoor_erc20_test.go:77-79

if err := waitForDrain(drainCtx, t.Logf, e.ethClient, 10); err != nil {
    t.Logf("warning: %v", err)
}

The 30-second drain timeout will fire regularly in CI (block times + 10 consecutive empty blocks can easily exceed 30 s under load). When it does, endBlock is captured before the mempool is actually empty, silently undercounting both TPS and MGas/s while the test still passes. CodeRabbit also flagged this and it remains open. The fix is straightforward — the function already returns a proper error:

s.Require().NoError(
    waitForDrain(drainCtx, t.Logf, e.ethClient, 10),
    "mempool did not drain in time",
)

If a soft failure is intentional (i.e. "report whatever we have"), the drain timeout should be made large enough (e.g. 2–3 minutes) to be confident the data is complete, and the warning should clearly mark the result as partial.

Fix this →

2. `resultWriter.flush()` overwrites on each call [Minor]

test/e2e/benchmark/output.go:82

if err := os.WriteFile(outputPath, data, 0644); err != nil {

os.WriteFile truncates and replaces the file. If two tests in the same suite use the same BENCH_JSON_OUTPUT path (possible if the env var is set globally), only the last flushed results survive. For now this works because each CI job only runs one test, but it's worth noting if more tests are added to the erc20-benchmark job. Consider appending/merging, or asserting that no existing file exists.

3. `collectBlockMetrics` makes N sequential RPC calls [Minor]

test/e2e/benchmark/helpers.go:382-416

Each block in the range makes two separate RPC calls (HeaderByNumber + TransactionCount). For 100k txs at ~500 tx/block = ~200 blocks this is fine, but for longer runs this will be slow post-test. TransactionCount can be avoided by reading GasUsed > 0 from the header (a zero-tx block has GasUsed == 0 in practice). Not blocking, just worth keeping in mind.

4. 10% sampling rate makes per-span stats statistically thin [Observation]

test/e2e/benchmark/suite_test.go:108

"--evnode.instrumentation.tracing_sample_rate", "0.1",

At 1000 TPS for several minutes, 10% sampling still gives thousands of Executor.ExecuteTxs spans so this is fine for the evNodeOverhead calculation. The comment about Jaeger capacity makes sense. No action needed unless you later need sub-1% granularity.

5. `extractions/setup-just@v3` not pinned [Minor — acknowledged]

.github/workflows/benchmark.yml:34,66,94

Flagged by GitHub Advanced Security. Same tag used in the other jobs that were presumably not changed in this PR. Worth pinning to a commit SHA for supply-chain safety, but not blocking.

Positives

Measurement window design in helpers.go is well thought out — using FirstBlockTime/LastBlockTime from non-empty blocks to compute steadyStateDuration correctly excludes warm-up and cool-down periods from throughput calculations.
Zero-division guards are correctly applied throughout (mgasPerSec, avgGasPerBlock, summarize with ss > 0 guard for TPS/blocksPerSec).
deleteAllSpammers pre-check is a good defensive pattern to prevent stale state from previous failed runs.
tryCollectServiceTraces vs collectServiceTraces split is clean — hard failure for required ev-node traces, graceful degradation for optional ev-reth traces.
Documentation is excellent throughout helpers.go — every exported and unexported type/function has a clear docstring explaining what it measures and why.
Cleanup registration is correct — t.Cleanup is called with a captured id (Go 1.22 range-over-int avoids the loop variable closure bug).

.github/workflows/benchmark.yml

+      - name: Set up Docker Buildx
+        uses: docker/setup-buildx-action@b5ca514318bd6ebac0fb2aedd5d36ec1b5c232a2 # v3.10.0
+      - name: Install just
+        uses: extractions/setup-just@v3


codecov · 2026-03-02T14:25:00Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.76%. Comparing base (805f927) to head (b14085e).
⚠️ Report is 7 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3114      +/-   ##
==========================================
- Coverage   60.93%   60.76%   -0.18%     
==========================================
  Files         113      113              
  Lines       11617    11672      +55     
==========================================
+ Hits         7079     7092      +13     
- Misses       3739     3777      +38     
- Partials      799      803       +4

Flag	Coverage Δ
combined	`60.76% <ø> (-0.18%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Spamoor Trace Benchmarks'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.50.

Benchmark suite	Current: `af2aebd`	Previous: `e877782`	Ratio
`SpamoorSmoke - BlockExecutor.ApplyBlock (avg)`	`70479` us	`7513.846385542169` us	`9.38`
`SpamoorSmoke - BlockExecutor.ProduceBlock (avg)`	`71381.01960784313` us	`8378.484939759037` us	`8.52`
`SpamoorSmoke - DA.Submit (avg)`	`1358.7` us	`678.9887005649717` us	`2.00`
`SpamoorSmoke - DASubmitter.SubmitData (avg)`	`14905` us	`1011.6774193548387` us	`14.73`
`SpamoorSmoke - Engine.GetPayload (avg)`	`30896.901960784315` us	`1659.436746987952` us	`18.62`
`SpamoorSmoke - Engine.NewPayload (avg)`	`36382.01960784314` us	`2473.5873493975905` us	`14.71`
`SpamoorSmoke - Executor.ExecuteTxs (avg)`	`70465.6862745098` us	`7501.469879518072` us	`9.39`
`SpamoorSmoke - Executor.GetTxs (avg)`	`3161` us	`1766.111111111111` us	`1.79`
`SpamoorSmoke - TxPool.GetTxs (avg)`	`3111.5` us	`1746.75` us	`1.78`

This comment was automatically generated by workflow using github-action-benchmark.

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

.github/workflows/benchmark.yml (1)

34-34: ⚠️ Potential issue | 🟠 Major

Pin extractions/setup-just to an immutable commit SHA in all jobs.

Using @v3 is mutable and can drift unexpectedly across runs. Pin to a 40-char SHA for reproducibility and supply-chain safety.

#!/bin/bash
# List workflow action refs that are not pinned to a 40-char commit SHA.
rg -nP '^\s*-\s*uses:\s*[^@]+@([^\s#]+)' .github/workflows \
| gawk '!/@[0-9a-f]{40}([[:space:]]|$)/ {print}'

Also applies to: 66-66, 94-94

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml at line 34, Replace the mutable action ref
"extractions/setup-just@v3" with an immutable 40-character commit SHA (e.g.,
extractions/setup-just@<40-char-sha>) so the workflow is pinned; update every
occurrence of that action (the other instances flagged in the workflow) to use
the same full commit SHA and verify no other actions in the workflow use short
tags like `@v3`, `@v2`, or `@main`—ensure the new refs are 40 hex chars to guarantee
reproducibility and supply-chain safety.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/benchmark/helpers.go`:
- Around line 201-214: Change waitForDrain to return an error instead of only
logging on timeout: update its signature (waitForDrain(ctx context.Context, log
func(string, ...any), client *ethclient.Client, consecutiveEmpty int) error),
replace the ctx.Done() branch to return a descriptive error (including emptyRun
and consecutiveEmpty) rather than calling log and returning, and update all call
sites (e.g., the test call in spamoor_erc20_test.go) to assert/handle the
returned error (for example using s.Require().NoError(waitForDrain(...))). Also
apply the same pattern to the other similar timeout branch noted in the file so
all timeouts propagate errors to callers.
- Around line 290-294: The AchievedTPS calculation in the blockMetricsSummary
return unconditionally divides by ss.Seconds(), which can be zero; update the
code that builds the blockMetricsSummary so you compute seconds := ss.Seconds()
(or similar), check if seconds == 0 and set AchievedTPS = 0 (or a safe default)
when zero, otherwise set AchievedTPS = float64(m.TotalTxCount) / seconds; modify
the return to use that guarded value (reference blockMetricsSummary,
AchievedTPS, ss.Seconds(), and m.TotalTxCount).

In `@test/e2e/benchmark/suite_test.go`:
- Around line 55-63: The current rethTag() function falls back to the hardcoded
defaultRethTag ("pr-140"), which is unsafe for CI; update the implementation so
it either (A) replaces defaultRethTag with an immutable stable release tag or
digest (e.g., a real release string) and use that as the fallback, or (B) remove
the fallback and make rethTag() fail fast when EV_RETH_TAG is not set
(log.Fatalf or return an error) to force CI configuration; change the symbol
defaultRethTag and the rethTag() function accordingly and ensure callers handle
the fatal/error behavior if you choose the fail-fast option.

---

Duplicate comments:
In @.github/workflows/benchmark.yml:
- Line 34: Replace the mutable action ref "extractions/setup-just@v3" with an
immutable 40-character commit SHA (e.g., extractions/setup-just@<40-char-sha>)
so the workflow is pinned; update every occurrence of that action (the other
instances flagged in the workflow) to use the same full commit SHA and verify no
other actions in the workflow use short tags like `@v3`, `@v2`, or `@main`—ensure the
new refs are 40 hex chars to guarantee reproducibility and supply-chain safety.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 805f927 and 4c7b7e1.

⛔ Files ignored due to path filters (1)

test/e2e/go.sum is excluded by !**/*.sum

📒 Files selected for processing (8)

.github/workflows/benchmark.yml
test/e2e/benchmark/helpers.go
test/e2e/benchmark/output.go
test/e2e/benchmark/spamoor_erc20_test.go
test/e2e/benchmark/spamoor_smoke_test.go
test/e2e/benchmark/suite_test.go
test/e2e/benchmark/traces.go
test/e2e/go.mod

test/e2e/benchmark/helpers.go

coderabbitai · 2026-03-02T15:00:33Z

test/e2e/benchmark/suite_test.go

+// TODO: temporary hardcoded tag, will be replaced with a proper release tag
+const defaultRethTag = "pr-140"
+
+func rethTag() string {
+	if tag := os.Getenv("EV_RETH_TAG"); tag != "" {
+		return tag
+	}
+	return defaultRethTag
+}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Search for EV_RETH_TAG usage across the codebase rg -n 'EV_RETH_TAG|defaultRethTag|rethTag\(\)|WithTag\(rethTag' --type-list | head -20

Repository: evstack/ev-node

Length of output: 448

🏁 Script executed:

# Search for CI/workflow files that might set EV_RETH_TAG find . -type f \( -name "*.yml" -o -name "*.yaml" \) | head -20

Repository: evstack/ev-node

Length of output: 759

🏁 Script executed:

# Check the actual test file and surrounding context cat -n test/e2e/benchmark/suite_test.go | head -80

Repository: evstack/ev-node

Length of output: 2877

🏁 Script executed:

# Search for any references to pr-140 in the codebase rg -n 'pr-140|pr-\d+' --type-list

Repository: evstack/ev-node

Length of output: 448

🏁 Script executed:

# Search for EV_RETH_TAG in workflows rg 'EV_RETH_TAG' .github/workflows/

Repository: evstack/ev-node

Length of output: 41

🏁 Script executed:

# Search for rethTag usage in the codebase rg 'rethTag|WithTag'

Repository: evstack/ev-node

Length of output: 173

🏁 Script executed:

# Check the benchmark workflow specifically cat -n .github/workflows/benchmark.yml

Repository: evstack/ev-node

Length of output: 8038

🏁 Script executed:

# Search for any references to pr-140 or similar tag patterns rg 'pr-140|pr-\d+|reth.*tag'

Repository: evstack/ev-node

Length of output: 123

Replace hardcoded reth tag with immutable release tag or require explicit CI configuration.

The fallback to "pr-140" is the only active code path in CI—EV_RETH_TAG is never configured in .github/workflows/benchmark.yml (lines 69–73). This makes benchmark runs non-deterministic and fragile: if that PR-scoped tag is removed or retargeted, all benchmarks fail silently or use unintended image versions. Either use a stable release tag/digest, or explicitly require EV_RETH_TAG to be set (fail fast if unset) to enforce reproducibility and match the test file determinism guideline.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/benchmark/suite_test.go` around lines 55 - 63, The current rethTag() function falls back to the hardcoded defaultRethTag ("pr-140"), which is unsafe for CI; update the implementation so it either (A) replaces defaultRethTag with an immutable stable release tag or digest (e.g., a real release string) and use that as the fallback, or (B) remove the fallback and make rethTag() fail fast when EV_RETH_TAG is not set (log.Fatalf or return an error) to force CI configuration; change the symbol defaultRethTag and the rethTag() function accordingly and ensure callers handle the fatal/error behavior if you choose the fail-fast option.

will be addressed in a follow up when there is a proper tag to use.

@chatton, sounds good! Since you'll be addressing this in a follow-up, you might want to track it with a GitHub issue or add a TODO comment in the code pointing to the future work—it'll make it easier to find when the proper release tag is available. 👍

🧠 Learnings used

Learnt from: CR Repo: evstack/ev-node PR: 0 File: CLAUDE.md:0-0 Timestamp: 2026-02-26T20:31:15.823Z Learning: Applies to **/*_test.go : Ensure tests are deterministic

- waitForDrain returns an error on timeout instead of silently logging - guard AchievedTPS computation when steady-state duration is zero

coderabbitai

♻️ Duplicate comments (1)

test/e2e/benchmark/spamoor_erc20_test.go (1)
77-79: ⚠️ Potential issue | 🟠 Major

Fail the benchmark when mempool drain times out.

Continuing after waitForDrain timeout can undercount processed txs/gas while still passing, which makes reported throughput non-deterministic. Use s.Require().NoError(err, "mempool did not drain in time") here instead of warning-only logging.

As per coding guidelines, **/*_test.go: Ensure tests are deterministic.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/benchmark/spamoor_erc20_test.go` around lines 77 - 79, The benchmark
currently only logs a warning when waitForDrain(drainCtx, t.Logf, e.ethClient,
10) returns an error, which can allow timeouts to undercount throughput; replace
the warning-only handling with a hard test failure by calling
s.Require().NoError(err, "mempool did not drain in time") (use the same err from
waitForDrain) so the benchmark fails deterministically on drain timeout; locate
the call to waitForDrain and swap the t.Logf branch to invoke
s.Require().NoError(err, "mempool did not drain in time") referencing
waitForDrain, drainCtx, e.ethClient and err.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/e2e/benchmark/spamoor_erc20_test.go`:
- Around line 77-79: The benchmark currently only logs a warning when
waitForDrain(drainCtx, t.Logf, e.ethClient, 10) returns an error, which can
allow timeouts to undercount throughput; replace the warning-only handling with
a hard test failure by calling s.Require().NoError(err, "mempool did not drain
in time") (use the same err from waitForDrain) so the benchmark fails
deterministically on drain timeout; locate the call to waitForDrain and swap the
t.Logf branch to invoke s.Require().NoError(err, "mempool did not drain in
time") referencing waitForDrain, drainCtx, e.ethClient and err.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c7b7e1 and b14085e.

📒 Files selected for processing (2)

test/e2e/benchmark/helpers.go
test/e2e/benchmark/spamoor_erc20_test.go

coderabbitai

♻️ Duplicate comments (1)

.github/workflows/benchmark.yml (1)

94-94: ⚠️ Potential issue | 🟠 Major

Pin extractions/setup-just to an immutable commit SHA.

Line [94] uses a mutable tag (@v3) for a third-party action, which weakens supply-chain integrity.

🔒 Suggested fix

       - name: Install just
-        uses: extractions/setup-just@v3
+        uses: extractions/setup-just@<full-40-char-commit-sha> # v3

#!/bin/bash
# Verify mutable vs pinned refs for setup-just in this workflow.
rg -nP 'uses:\s*extractions/setup-just@v[0-9]+(\.[0-9]+)?(\.[0-9]+)?\b' .github/workflows/benchmark.yml
rg -nP 'uses:\s*extractions/setup-just@[0-9a-f]{40}\b' .github/workflows/benchmark.yml

# Expected after fix:
# - First command: no matches
# - Second command: one match per setup-just usage

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml at line 94, Replace the mutable action
reference "uses: extractions/setup-just@v3" with a pinned immutable commit SHA;
locate the workflow line that contains uses: extractions/setup-just@v3 and
change it to uses: extractions/setup-just@<commit-sha> (the full 40-character
commit hash from the action repo) so the workflow references a specific
immutable revision instead of the v3 tag.

🧹 Nitpick comments (1)

.github/workflows/benchmark.yml (1)
88-88: Align actions/setup-go version in erc20-benchmark job with other benchmark jobs.

The erc20-benchmark job uses v6.2.0 while evm-benchmark and spamoor-benchmark use v6.3.0, creating inconsistency across benchmark tooling.
♻️ Suggested fix
-        uses: actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5 # v6.2.0
+        uses: actions/setup-go@4b73464bb391d4059bd26b0524d20df3927bd417 # v6.3.0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/benchmark.yml at line 88, Update the actions/setup-go step
in the erc20-benchmark job to use the same version as the other benchmark jobs:
replace the pinned ref
"actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5" (v6.2.0) with the
ref used by evm-benchmark and spamoor-benchmark (v6.3.0) so the setup-go action
version is consistent across benchmark jobs.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In @.github/workflows/benchmark.yml:
- Line 94: Replace the mutable action reference "uses:
extractions/setup-just@v3" with a pinned immutable commit SHA; locate the
workflow line that contains uses: extractions/setup-just@v3 and change it to
uses: extractions/setup-just@<commit-sha> (the full 40-character commit hash
from the action repo) so the workflow references a specific immutable revision
instead of the v3 tag.

---

Nitpick comments:
In @.github/workflows/benchmark.yml:
- Line 88: Update the actions/setup-go step in the erc20-benchmark job to use
the same version as the other benchmark jobs: replace the pinned ref
"actions/setup-go@7a3fe6cf4cb3a834922a1244abfce67bcef6a0c5" (v6.2.0) with the
ref used by evm-benchmark and spamoor-benchmark (v6.3.0) so the setup-go action
version is consistent across benchmark jobs.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b14085e and af2aebd.

⛔ Files ignored due to path filters (1)

test/e2e/go.sum is excluded by !**/*.sum

📒 Files selected for processing (2)

.github/workflows/benchmark.yml
test/e2e/go.mod

🚧 Files skipped from review as they are similar to previous changes (1)

test/e2e/go.mod

chatton added 21 commits February 25, 2026 12:16

fix: correct BENCH_JSON_OUTPUT path for spamoor benchmark

18fc15a

go test sets the working directory to the package under test, so the env var should be relative to test/e2e/benchmark/, not test/e2e/.

fix: place package pattern before test binary flags in benchmark CI

fccd9db

go test treats all arguments after an unknown flag (--evm-binary) as test binary args, so ./benchmark/ was never recognized as a package pattern.

fix: adjust evm-binary path for benchmark subpackage working directory

ae525ca

go test sets the cwd to the package directory (test/e2e/benchmark/), so the binary path needs an extra parent traversal.

wip: erc20 benchmark test

039eaf7

fix: exclude benchmark subpackage from make test-e2e

85c9d2d

The benchmark package doesn't define the --binary flag that test-e2e passes. It has its own CI workflow so it doesn't need to run here.

Merge branch 'main' into cian/bench-refactor

1c3b560

# Conflicts: # scripts/test.mk

chore: specify http

03b9239

chore: filter out benchmark tests from test-e2e

fe3ca23

Merge branch 'main' into cian/bench-refactor

8752fee

Merge branch 'cian/bench-refactor' into cian/erc20-benchmark

06f532b

# Conflicts: # test/e2e/benchmark/suite_test.go

Merge remote-tracking branch 'origin/main' into cian/erc20-benchmark

560974a

# Conflicts: # test/e2e/benchmark/suite_test.go

chore: collect all traces at once

26bb117

chore: self review

b88cae3

docs: add detailed documentation to benchmark helper methods

676e0d1

github-actions bot assigned chatton Mar 2, 2026

chatton commented Mar 2, 2026

View reviewed changes

ci: add ERC20 throughput benchmark job

f4949a1

github-advanced-security bot found potential problems Mar 2, 2026

View reviewed changes

chore: remove span assertions

4c7b7e1

github-actions bot reviewed Mar 2, 2026

View reviewed changes

chatton marked this pull request as ready for review March 2, 2026 14:52

coderabbitai bot reviewed Mar 2, 2026

View reviewed changes

chatton requested a review from alpe March 2, 2026 15:30

chatton mentioned this pull request Mar 2, 2026

[EPIC] Benchmarks #2288

Open

7 tasks

fix: guard against drain timeout and zero-duration TPS division

b14085e

- waitForDrain returns an error on timeout instead of silently logging - guard AchievedTPS computation when steady-state duration is zero

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Merge branch 'main' into cian/erc20-benchmark

af2aebd

coderabbitai bot reviewed Mar 3, 2026

View reviewed changes

Conversation

chatton commented Mar 2, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

chatton Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatton Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatton Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Issues

1. Drain timeout is silently swallowed — metrics will be non-deterministic [Major]

2. resultWriter.flush() overwrites on each call [Minor]

3. collectBlockMetrics makes N sequential RPC calls [Minor]

4. 10% sampling rate makes per-span stats statistically thin [Observation]

5. extractions/setup-just@v3 not pinned [Minor — acknowledged]

Positives

Uh oh!

Check warning

Uh oh!

codecov bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

⚠️ Performance Alert ⚠️

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chatton Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

chatton commented Mar 2, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 2, 2026 •

edited

Loading

github-actions bot commented Mar 2, 2026 •

edited

Loading

claude bot commented Mar 2, 2026 •

edited

Loading

2. `resultWriter.flush()` overwrites on each call [Minor]

3. `collectBlockMetrics` makes N sequential RPC calls [Minor]

5. `extractions/setup-just@v3` not pinned [Minor — acknowledged]

codecov bot commented Mar 2, 2026 •

edited

Loading

github-actions bot left a comment •

edited

Loading

coderabbitai bot Mar 2, 2026 •

edited

Loading