Skip to content

feat(benchmarking): adding gas burner test#3115

Draft
chatton wants to merge 24 commits intomainfrom
cian/gass-burner-2
Draft

feat(benchmarking): adding gas burner test#3115
chatton wants to merge 24 commits intomainfrom
cian/gass-burner-2

Conversation

@chatton
Copy link
Contributor

@chatton chatton commented Mar 2, 2026

Overview

Closes #3123

Adds TestGasBurner to measure gas throughput using a deterministic gasburner workload. Reports seconds_per_gigagas as the primary metric.

…mark

- Create test/e2e/benchmark/ subpackage with SpamoorSuite (testify/suite)
- Move spamoor smoke test into suite as TestSpamoorSmoke
- Split helpers into focused files: traces.go, output.go, metrics.go
- Introduce resultWriter for defer-based benchmark JSON output
- Export shared symbols from evm_test_common.go for cross-package use
- Restructure CI to fan-out benchmark jobs and fan-in publishing
- Run benchmarks on PRs only when benchmark-related files change
Resolve conflicts keeping the benchmark suite refactoring:
- benchmark.yml: keep path filters and suite-style test command
- evm_spamoor_smoke_test.go: keep deleted (moved to benchmark pkg)
- evm_test_common.go: keep exported types, drop writeTraceBenchmarkJSON
  (now in benchmark/output.go)
go test sets the working directory to the package under test, so the
env var should be relative to test/e2e/benchmark/, not test/e2e/.
go test treats all arguments after an unknown flag (--evm-binary) as
test binary args, so ./benchmark/ was never recognized as a package
pattern.
go test sets the cwd to the package directory (test/e2e/benchmark/),
so the binary path needs an extra parent traversal.
The benchmark package doesn't define the --binary flag that test-e2e
passes. It has its own CI workflow so it doesn't need to run here.
…nfig

collectBlockMetrics hit reth's 20K FilterLogs limit at high tx volumes.
Replace with direct header iteration over [startBlock, endBlock] and add
Phase 1 metrics: non-empty ratio, block interval p50/p99, gas/block and
tx/block p50/p99.

Optimize spamoor configuration for 100ms block time:
- --slot-duration 100ms, --startup-delay 0 on daemon
- throughput=50 per 100ms slot (500 tx/s per spammer)
- max_pending=50000 to avoid 3s block poll backpressure
- 5 staggered spammers with 50K txs each

Results: 55 MGas/s, 1414 TPS, 19.8% non-empty blocks (up from 6%).
- Move startBlock capture after spammer creation to exclude warm-up
- Replace 20s drain sleep with smart poll (waitForDrain)
- Add deleteAllSpammers cleanup to handle stale spamoor DB entries
- Lower trace sample rate to 10% to prevent Jaeger OOM
- make reth tag configurable via EV_RETH_TAG env var (default pr-140)
- fix OTLP config: remove duplicate env vars, use http/protobuf protocol
- use require.Eventually for host readiness polling
- rename requireHTTP to requireHostUp
- use non-fatal logging in resultWriter.flush deferred context
- fix stale doc comment (setupCommonEVMEnv -> SetupCommonEVMEnv)
- rename loop variable to avoid shadowing testing.TB convention
- add block/internal/executing/** to CI path trigger
- remove unused require import from output.go
# Conflicts:
#	scripts/test.mk
# Conflicts:
#	test/e2e/benchmark/suite_test.go
# Conflicts:
#	test/e2e/benchmark/suite_test.go
move EV_RETH_TAG resolution and rpc connection limits into setupEnv
so all benchmark tests share the same reth configuration. lower ERC20
spammer count from 5 to 2 to reduce resource contention on local
hardware while keeping the loop for easy scaling on dedicated infra.
- add blockMetricsSummary with summarize(), log(), and entries() methods
- add evNodeOverhead() for computing ProduceBlock vs ExecuteTxs overhead
- add collectTraces() suite method to deduplicate trace collection pattern
- add addEntries() convenience method on resultWriter
- slim TestERC20Throughput from ~217 to ~119 lines
- reuse collectTraces in TestSpamoorSmoke
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 2, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch cian/gass-burner-2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatton chatton mentioned this pull request Mar 2, 2026
9 tasks
@claude
Copy link
Contributor

claude bot commented Mar 2, 2026

Claude finished @chatton's task in 1m 46s —— View job


Code Review

  • Gather context
  • Read changed files
  • Review code quality & architecture
  • Review error handling & resilience
  • Review testing & reliability
  • Post final review

Overall the PR is well-structured and follows the existing benchmark patterns closely. The warmup approach (waiting for 50 sent txs before recording startBlock) is a nice improvement over the ERC20 test's implicit warmup. A few issues worth addressing:

Issues

1. Missing transaction failure assertion (gasburner_test.go)

TestGasBurner never checks that transactions didn't fail. TestERC20Throughput asserts this explicitly via waitForSpamoorDone which returns failed:

// ERC20 (spamoor_erc20_test.go:103-104)
s.Require().Greater(sent, float64(0), "at least one transaction should have been sent")
s.Require().Zero(failed, "no transactions should have failed")

The gas burner uses waitForMetricTarget instead, which only tracks spamoor_transactions_sent_total. If a large portion of transactions fail but total sent still reaches totalCount, the test will pass silently. Even a soft check like:

failed, _ := api.GetMetrics()["spamoor_transactions_failed_total"]
if f := sumCounter(failed); f > 0 {
    t.Logf("warning: %.0f transactions failed", f)
}

…would make failures visible. Fix this →

2. secsPerGigagas == 0 still written to result writer (gasburner_test.go:116–120)

When AchievedMGas == 0 (e.g. all blocks empty), secsPerGigagas stays at 0 and is unconditionally appended to the result writer. On the benchmark dashboard, 0 s/Ggas reads as "infinitely fast" rather than "no data". The evNodeOverhead entry already guards with if overhead, ok := ...; ok, so applying the same pattern here would be consistent:

if secsPerGigagas > 0 {
    w.addEntry(entry{...})
}

Fix this →

3. waitForMetricTarget has no progress logging (helpers.go:386–398)

During the 5-minute wait for warmup and full send, there's no indication of how far along the test is. waitForSpamoorDone logs rate and elapsed time on every tick. A single log per poll would make it much easier to diagnose slow runs:

t.Logf("waiting for metric %s: current=%.0f target=%.0f", name, v, target)

Fix this →

Nits

  • api alias (gasburner_test.go:34): api := e.spamoorAPI is used only in this test. All other code in the suite uses e.spamoorAPI directly, including the deleteAllSpammers call on line 36 which gets api not e.spamoorAPI. Consistent, but a cosmetic divergence from the ERC20 test style.

  • Alignment spacing (helpers.go:314–316): Extra whitespace was added for struct field alignment in summarize(). This is fine, but golangci-lint may flag it depending on config. Worth checking with just lint.

Positives

  • The two-phase warmup (wait for 50 txs before recording startBlock) is more deliberate and accurate than the ERC20 approach.
  • collectBlockMetrics, blockMetricsSummary, and related helpers are well-documented, cleanly separated, and reused across both tests — good refactoring.
  • waitForDrain correctly resets the empty-block counter on any non-empty block, avoiding false positives.
  • evNodeOverhead calculation is clear and the guard on produceAvg <= 0 prevents division by zero.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 2, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 2, 2026, 3:48 PM

@codecov
Copy link

codecov bot commented Mar 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 60.97%. Comparing base (4c7b7e1) to head (6f56c80).

Additional details and impacted files
@@                   Coverage Diff                    @@
##           cian/erc20-benchmark    #3115      +/-   ##
========================================================
+ Coverage                 60.90%   60.97%   +0.07%     
========================================================
  Files                       113      113              
  Lines                     11617    11617              
========================================================
+ Hits                       7075     7084       +9     
+ Misses                     3743     3735       -8     
+ Partials                    799      798       -1     
Flag Coverage Δ
combined 60.97% <ø> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Base automatically changed from cian/erc20-benchmark to main March 4, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant