Skip to content

chore: make smoketests faster via test.concurrent, no sharding, 400% workers#773

Open
jason-rl wants to merge 3 commits intomainfrom
jason/faster_smoketests
Open

chore: make smoketests faster via test.concurrent, no sharding, 400% workers#773
jason-rl wants to merge 3 commits intomainfrom
jason/faster_smoketests

Conversation

@jason-rl
Copy link
Copy Markdown
Contributor

@jason-rl jason-rl commented Apr 4, 2026

User description

⚠️ PR Title Must Follow Conventional Commits

Format: feat[optional scope]: <description>

Examples: feat: add new SDK method · feat(storage): support file uploads · feat!: breaking API change


Description

Motivation

Changes

Testing

  • Unit tests added
  • Integration tests added
  • Smoke Tests added/updated
  • Tested locally

Breaking Changes

Checklist

  • PR title follows Conventional Commits format (feat: or feat(scope):)
  • Documentation updated (if needed)
  • Breaking changes documented (if applicable)

CodeAnt-AI Description

Speed up smoke tests and run them in parallel

What Changed

  • Smoke tests now run in parallel, which shortens the time needed to complete the suite
  • Smoke test runs use a higher worker count to finish sooner
  • The smoke test workflow no longer splits tests across shards or collects shard logs; it runs as one parallel job instead
  • Coverage smoke-test runs now fail normally when tests fail, and the extra failure step was removed

Impact

✅ Shorter smoke test runs
✅ Faster CI feedback
✅ Fewer delayed test failures

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 4, 2026

✅ Object Smoke Tests & Coverage Report

Test Results

✅ All smoke tests passed

Coverage Results

Metric Coverage Required Status
Functions 100% 100%
Lines 90.03% - ℹ️
Branches 68.95% - ℹ️
Statements 88.98% - ℹ️

Coverage Requirement: 100% function coverage (all public methods must be called in smoke tests)

✅ All tests passed and all object methods are covered!

View detailed coverage report
File Functions Lines Branches
src/sdk.ts ✅ 100% 86.11% 72.58%
src/sdk/agent.ts ✅ 100% 100% 100%
src/sdk/axon.ts ✅ 100% 93.75% 100%
src/sdk/blueprint.ts ✅ 100% 100% 80%
src/sdk/devbox.ts ✅ 100% 91.96% 94.28%
src/sdk/execution-result.ts ✅ 100% 92.68% 70.83%
src/sdk/execution.ts ✅ 100% 95.65% 87.5%
src/sdk/gateway-config.ts ✅ 100% 100% 100%
src/sdk/mcp-config.ts ✅ 100% 100% 100%
src/sdk/network-policy.ts ✅ 100% 100% 100%
src/sdk/scenario-builder.ts ✅ 100% 98.46% 80.7%
src/sdk/scenario-run.ts ✅ 100% 92.1% 50%
src/sdk/scenario.ts ✅ 100% 100% 100%
src/sdk/scorer.ts ✅ 100% 100% 100%
src/sdk/secret.ts ✅ 100% 100% 100%
src/sdk/snapshot.ts ✅ 100% 100% 100%
src/sdk/storage-object.ts ✅ 100% 80% 48.93%

📋 View workflow run

@jason-rl jason-rl changed the title Make smoketests faster via test.concurrent Make smoketests faster via test.concurrent, no sharding, 400% workers Apr 4, 2026
@jason-rl jason-rl changed the title Make smoketests faster via test.concurrent, no sharding, 400% workers chore: make smoketests faster via test.concurrent, no sharding, 400% workers Apr 4, 2026
@jason-rl jason-rl requested review from alb-rl and dines-rl April 4, 2026 02:08
@jason-rl jason-rl marked this pull request as ready for review April 4, 2026 02:09
@codeant-ai
Copy link
Copy Markdown
Contributor

codeant-ai bot commented Apr 4, 2026

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai bot added the size:L This PR changes 100-499 lines, ignoring generated files label Apr 4, 2026
});

test(
test.concurrent(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This reconnect test explicitly requires an idle SSE period to trigger a 408/reconnect cycle, but running it with test.concurrent allows other tests in the same suite to publish events to the same axon during that idle window, breaking the reconnect precondition and causing flaky timeouts or false negatives. Run this test non-concurrently to preserve its timing assumptions. [race condition]

Severity Level: Major ⚠️
- ❌ SSE reconnect smoke test becomes flaky or times out.
- ⚠️ Axon SSE reconnect behavior coverage becomes unreliable.
Suggested change
test.concurrent(
test(
Steps of Reproduction ✅
1. Run the smoketests with RUN_SMOKETESTS enabled so the top-level describe in
`tests/smoketests/object-oriented/axon.test.ts:8` is active instead of skipped.

2. In the `describe('axon lifecycle')` block (`axon.test.ts:9`), the shared `axon`
instance is created once in `beforeAll` (`axon.test.ts:13-18`) via `sdk.axon.create()`,
and all tests in this describe share this same axon and axonId.

3. Jest schedules all `test.concurrent` tests in this describe to run concurrently,
including:

   - the reconnect test at `axon.test.ts:79-154` ("SSE subscribe reconnects after idle
   (408)..."),

   - the single-publish test at `axon.test.ts:48-59` ("publish event to axon"),

   - the multi-publish test at `axon.test.ts:61-77` ("publish multiple events and verify
   sequence increases"),

   - and the SSE stream test at `axon.test.ts:156-178` ("subscribe to SSE stream and
   receive events"),

   all of which call `axon.publish` on the same `axon`.

4. When the reconnect test runs, it first publishes events 1–5 (`axon.test.ts:82-90`),
opens an SSE stream (`subscribeSse` at `axon.test.ts:92`), then intentionally waits ~70
seconds without server traffic (`setTimeout` at `axon.test.ts:100-104` with comment "Hold
the line open with no server traffic so the backend can return 408"). During this intended
idle window, any concurrently running test (e.g., `test.concurrent('publish event to
axon', ...)` at `axon.test.ts:48-54` or `test.concurrent('subscribe to SSE stream and
receive events', ...)` at `axon.test.ts:156-163`) may publish additional events to the
same axon, preventing the stream from going idle, so the backend never returns 408. This
causes the reconnect test to either exceed `MEDIUM_TIMEOUT` (`axon.test.ts:153-154`) or
hit the guard `if (sequences.length > 150) throw` (`axon.test.ts:141-142`), yielding flaky
timeouts/false negatives that disappear if the reconnect test is run alone or
non-concurrently.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/axon.test.ts
**Line:** 79:79
**Comment:**
	*Race Condition: This reconnect test explicitly requires an idle SSE period to trigger a 408/reconnect cycle, but running it with `test.concurrent` allows other tests in the same suite to publish events to the same axon during that idle window, breaking the reconnect precondition and causing flaky timeouts or false negatives. Run this test non-concurrently to preserve its timing assumptions.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

],
// Add display name for smoke tests to make it clearer in output
...((runSmoketests || runBuiltPackageTest) && { displayName: 'Smoke Tests' }),
...(runSmoketests && { maxWorkers: 50 }),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Using a fixed maxWorkers: 50 for smoke tests can oversubscribe low-core CI/local machines and trigger worker spawn/resource exhaustion failures (for example spawn EAGAIN) or severe process thrashing. Use a CPU-relative value so worker count scales with available cores. [possible bug]

Severity Level: Major ⚠️
- ⚠️ Local `yarn test:smoke` can oversubscribe low-core machines.
- ⚠️ Heavy smoke suites may thrash or intermittently fail locally.
- ⚠️ Developer smoke-test feedback loop slowed or destabilized.
Suggested change
...(runSmoketests && { maxWorkers: 50 }),
...(runSmoketests && { maxWorkers: '50%' }),
Steps of Reproduction ✅
1. On a low-core developer machine (e.g., 2 CPUs), run `yarn test:smoke` from the
repository root, which executes the script defined in `package.json:21` as `"test:smoke":
"RUN_SMOKETESTS=1 jest --verbose tests/smoketests"`.

2. The Jest CLI starts without an explicit `--maxWorkers` flag, so it loads the default
configuration from `jest.config.ts:6-9`, where `runSmoketests` is set based on
`process.env['RUN_SMOKETESTS']` at `jest.config.ts:3`.

3. Because `RUN_SMOKETESTS=1` is set by the script, `runSmoketests` is `true` and the
spread at `jest.config.ts:38` applies, forcing `maxWorkers: 50` into the Jest
configuration for this run.

4. Jest now attempts to run the smoke test suite in `tests/smoketests/**` (triggered by
the script at `package.json:21`) with up to 50 workers on a machine with far fewer cores,
causing significant oversubscription; this can manifest as heavy CPU thrashing or, on
constrained environments with low process limits, worker-spawn failures such as `spawn
EAGAIN`, degrading or breaking local smoke-test runs.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** jest.config.ts
**Line:** 38:38
**Comment:**
	*Possible Bug: Using a fixed `maxWorkers: 50` for smoke tests can oversubscribe low-core CI/local machines and trigger worker spawn/resource exhaustion failures (for example `spawn EAGAIN`) or severe process thrashing. Use a CPU-relative value so worker count scales with available cores.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

}

test(
test.concurrent(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test temporarily deletes GITHUB_TOKEN and ANTHROPIC_API_KEY from process.env, so running it with test.concurrent introduces a race with live example tests that read the same global env vars. That can make unrelated live examples fail nondeterministically depending on execution order. Run this test serially to avoid shared-state interference. [race condition]

Severity Level: Major ⚠️
- ⚠️ `mcp-github-tools` smoketest can flake under live-example runs.
- ⚠️ `devbox-mounts` smoketest can intermittently fail due to missing env.
- ⚠️ `yarn test:examples` CI job becomes flaky and unreliable.
Suggested change
test.concurrent(
test(
Steps of Reproduction ✅
1. Configure environment to run live examples by setting `RUN_EXAMPLE_LIVE_TESTS=1` plus
valid `RUNLOOP_API_KEY`, `GITHUB_TOKEN`, and `ANTHROPIC_API_KEY`. These variables are read
in `tests/smoketests/examples/examples.test.ts:5-10` and the required env for each example
(including `GITHUB_TOKEN` and `ANTHROPIC_API_KEY` for `mcp-github-tools`) is defined in
`examples/registry.ts:22-63`.

2. Run `yarn test:examples`, which invokes `RUN_SMOKETESTS=1 jest --verbose
tests/smoketests/examples/examples.test.ts` as defined in `package.json:20-23`. Jest
executes the `describe('smoketest: examples', ...)` block in
`tests/smoketests/examples/examples.test.ts:4-25`, registering one concurrent test per
`exampleRegistry` entry via `const itLive = runLiveExample ? test.concurrent :
test.concurrent.skip;` at line 10 and `itLive(...` at lines 12-24.

3. Because the deterministic skip test is also declared with `test.concurrent(` at
`tests/smoketests/examples/examples.test.ts:27`, Jest runs it concurrently with the live
example tests. Inside this test, `process.env['GITHUB_TOKEN']` and
`process.env['ANTHROPIC_API_KEY']` are saved (lines 35-37) and then deleted globally at
lines 38-39 before calling `mcpExampleEntry.run({ skipIfMissingCredentials: true })` at
line 42.

4. While those environment variables are temporarily deleted, other concurrent tests in
the same file invoke `entry.run()` for live examples such as `devbox-mounts` and
`mcp-github-tools` (registered in `examples/registry.ts:22-63`). Their implementations
read `process.env['GITHUB_TOKEN']` and `process.env['ANTHROPIC_API_KEY']` at runtime in
`examples/mcp-github-tools.ts:18-32,118-135` and `examples/devbox-mounts.ts:81-87`. If
these reads occur during the deletion window created by the concurrent skip-mode test, the
examples throw errors like "Set GITHUB_TOKEN to a GitHub PAT with repo scope." or "Set
ANTHROPIC_API_KEY to run the Claude Code mount example.", causing nondeterministic
failures in the corresponding smoketests despite credentials being correctly configured.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/examples/examples.test.ts
**Line:** 27:27
**Comment:**
	*Race Condition: This test temporarily deletes `GITHUB_TOKEN` and `ANTHROPIC_API_KEY` from `process.env`, so running it with `test.concurrent` introduces a race with live example tests that read the same global env vars. That can make unrelated live examples fail nondeterministically depending on execution order. Run this test serially to avoid shared-state interference.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('concurrent requests', async () => {
test.concurrent('concurrent requests', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This load-oriented test already issues parallel requests internally, and making the test itself concurrent adds another layer of shared-suite parallelism against the same gateway/devbox fixture. That can trigger transient throttling/contention and make the expected all-200 assertion nondeterministic. Keep this test serial to isolate its own concurrency signal. [race condition]

Severity Level: Major ⚠️
- ❌ Concurrent-requests smoketest may intermittently fail under load.
- ⚠️ Harder to trust gateway concurrency regression coverage.
Suggested change
test.concurrent('concurrent requests', async () => {
test('concurrent requests', async () => {
Steps of Reproduction ✅
1. Enable the smoketest suite by setting `RUN_SMOKETESTS=1` and required gateway
credentials, then execute the Jest tests including
`tests/smoketests/object-oriented/gateway-config.test.ts`.

2. In `describe('comprehensive gateway proxying tests', ...)` at lines 282–714, a shared
`devbox`, `gatewayConfig`, `networkPolicy`, `gatewayUrl`, and `gatewayToken` are created
once in `beforeAll` (lines 333–379) and reused by all tests in this block; cleanup happens
in `afterAll` (lines 381–398).

3. Most tests in this describe are declared with `test.concurrent(...)` and exercise the
same shared `devbox` and gateway via `curlRequest` or `devbox!.cmd.exec` (see, for
example, 'GET request - list devboxes' at 400–407, 'large response handling' at 478–485,
'multipart form data' at 531–556, 'large request body' at 583–600).

4. The `test.concurrent('concurrent requests', async () => { ... })` at lines 621–639
builds a Bash script (lines 622–629) that itself issues three background `curl` requests
in parallel to `${gatewayUrl}/v1/devboxes?limit=1` using the shared `gatewayToken`, then
asserts that every captured HTTP status code is `200` at lines 635–637. Because this test
is both internally concurrent (three curls at once) and marked `test.concurrent` (running
alongside other gateway tests that hit the same `devbox` and gateway), the combined
concurrency can trigger transient throttling or contention on the gateway/devbox, causing
some responses to be non-200 even though the product behavior is acceptable under normal
load. This makes the "all 200" assertion flaky. Running this particular test
non-`concurrent` isolates its intended concurrency behavior to the inner Bash loop,
reducing nondeterministic failures due to shared-suite contention.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/gateway-config.test.ts
**Line:** 621:621
**Comment:**
	*Race Condition: This load-oriented test already issues parallel requests internally, and making the test itself `concurrent` adds another layer of shared-suite parallelism against the same gateway/devbox fixture. That can trigger transient throttling/contention and make the expected all-200 assertion nondeterministic. Keep this test serial to isolate its own concurrency signal.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('request timeout handling', async () => {
test.concurrent('request timeout handling', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test now runs concurrently against the same shared devbox used by many other tests in the suite, but it asserts a strict wall-clock upper bound (elapsed < 30000). Under concurrent execution, command scheduling/queueing latency can inflate elapsed time and cause false failures even when the endpoint is healthy. Run this timing-sensitive test serially. [race condition]

Severity Level: Major ⚠️
- ❌ Timeout smoketest may intermittently fail despite healthy gateway.
- ⚠️ Smoketest timing assertions become unreliable under suite load.
Suggested change
test.concurrent('request timeout handling', async () => {
test('request timeout handling', async () => {
Steps of Reproduction ✅
1. Enable the smoketest suite by setting `RUN_SMOKETESTS=1` and required gateway
credentials (`RUNLOOP_API_KEY`, optional `RUNLOOP_BASE_URL`), then run the Jest tests that
include `tests/smoketests/object-oriented/gateway-config.test.ts`.

2. In this file, the `describe('comprehensive gateway proxying tests', ...)` block at
lines 282–714 sets up a single shared `devbox`, `gatewayConfig`, `networkPolicy`,
`gatewayUrl`, and `gatewayToken` in `beforeAll` (lines 333–379) and tears them down in
`afterAll` (lines 381–398).

3. Inside that same `describe`, many tests are declared with `test.concurrent(...)` (e.g.,
'GET request - list devboxes' at lines 400–407, 'large response handling' at 478–485,
'multipart form data' at 531–556, 'large request body' at 583–600). Jest will run these
tests concurrently against the shared `devbox` and gateway.

4. The `test.concurrent('request timeout handling', async () => { ... })` at lines 656–671
uses the shared `curlRequest` helper (lines 293–331) to POST to
`/v1/devboxes/${devbox!.id}/execute_sync` with a `sleep 2` command, measuring wall-clock
time via `const startTime = Date.now()` and `const elapsed = Date.now() - startTime`, then
asserting `expect(elapsed).toBeGreaterThan(2000)` and
`expect(elapsed).toBeLessThan(30000)`. Under concurrent execution with other
`test.concurrent` cases that also call `devbox!.cmd.exec` or `curlRequest`,
scheduling/queueing delays on the shared `devbox` or gateway can legitimately push
`elapsed` over 30000 ms while still returning `httpCode === 200`, causing this test to
fail even though the endpoint behavior is correct. Running this particular test serially
(non-`concurrent`) removes the extra suite-level contention and reduces this flakiness
risk.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/gateway-config.test.ts
**Line:** 656:656
**Comment:**
	*Race Condition: This test now runs concurrently against the same shared `devbox` used by many other tests in the suite, but it asserts a strict wall-clock upper bound (`elapsed < 30000`). Under concurrent execution, command scheduling/queueing latency can inflate elapsed time and cause false failures even when the endpoint is healthy. Run this timing-sensitive test serially.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎


describe('network policy egress configurations', () => {
test('create policy with allow_all=true', async () => {
test.concurrent('create policy with allow_all=true', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test performs a create/read/delete flow against the live API; running it as test.concurrent increases write fanout and can trigger transient 409/429 failures under the new --maxWorkers=400% smoke run. Keep this mutation test serial to avoid race-driven flakiness. [race condition]

Severity Level: Major ⚠️
- ⚠️ Smoketest `create policy with allow_all=true` may fail nondeterministically.
- ⚠️ `yarn test:smoke` stability reduced by concurrent live writes.
Suggested change
test.concurrent('create policy with allow_all=true', async () => {
test('create policy with allow_all=true', async () => {
Steps of Reproduction ✅
1. Run the smoketest suite with a real backend by executing `yarn test:smoke`, which
invokes `RUN_SMOKETESTS=1 jest --verbose tests/smoketests` as defined in
`package.json:19-23`.

2. Jest loads `tests/smoketests/object-oriented/network-policy.test.ts`, which constructs
a shared `RunloopSDK` client using live credentials from `process.env['RUNLOOP_BASE_URL']`
and `process.env['RUNLOOP_API_KEY']` (`utils.ts:17-23`, referenced at
`network-policy.test.ts:1-4`).

3. Inside `describe('network policy egress configurations', ...)`
(`network-policy.test.ts:138-226`), Jest schedules multiple `test.concurrent` cases
together, including `'create policy with allow_all=true'` at `network-policy.test.ts:139`,
`'create policy with allow_devbox_to_devbox=true'` at `:153`, `'create policy with
multiple allowed hostnames'` at `:171`, `'create policy with empty allowed_hostnames
(DENY_ALL)'` at `:191`, and `'create policy with wildcard hostnames'` at `:207`.

4. When these tests run concurrently, each issues a live `sdk.networkPolicy.create(...)`
followed by `policy.getInfo()` and `cleanUpPolicy(policy)` (which calls `policy.delete()`;
`utils.ts:36-44`), while other smoketests also create network policies in parallel (e.g.,
`blueprint.test.ts:316-320` and `:356-360`, `gateway-config.test.ts:339-343` and
`:742-746`). This amplifies simultaneous create/delete traffic to the same backend. Any
transient backend write failures (e.g., throttling or intermittent 5xx) surface as
uncaught errors in this test (no local retry/handling around `create`/`getInfo`), causing
nondeterministic failures of `test.concurrent('create policy with allow_all=true', ...)`.
Making this test non-concurrent (`test(...)`) would reduce per-file write fanout and lower
the flakiness risk in the smoketest run.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/network-policy.test.ts
**Line:** 139:139
**Comment:**
	*Race Condition: This test performs a create/read/delete flow against the live API; running it as `test.concurrent` increases write fanout and can trigger transient 409/429 failures under the new `--maxWorkers=400%` smoke run. Keep this mutation test serial to avoid race-driven flakiness.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('create policy with allow_devbox_to_devbox=true', async () => {
test.concurrent('create policy with allow_devbox_to_devbox=true', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test creates and deletes a network policy, so making it concurrent adds unnecessary contention on backend policy writes and can fail intermittently in high-parallel CI. Run it serially to keep behavior deterministic. [race condition]

Severity Level: Major ⚠️
- ⚠️ `allow_devbox_to_devbox` smoketest may intermittently fail under load.
- ⚠️ Overall `yarn test:smoke` reliability impacted by concurrent mutations.
Suggested change
test.concurrent('create policy with allow_devbox_to_devbox=true', async () => {
test('create policy with allow_devbox_to_devbox=true', async () => {
Steps of Reproduction ✅
1. Execute the smoketests against a live Runloop deployment using `yarn test:smoke`, which
runs `RUN_SMOKETESTS=1 jest --verbose tests/smoketests` (`package.json:19-23`).

2. Jest initializes the shared `sdk` client in
`tests/smoketests/object-oriented/network-policy.test.ts` via `makeClientSDK()`
(`network-policy.test.ts:1-4`), which binds to `RUNLOOP_BASE_URL` and `RUNLOOP_API_KEY`
and configures `maxRetries: 1` and `timeout: 120_000` (`utils.ts:17-23`), meaning
individual calls have minimal retry logic.

3. In the `describe('network policy egress configurations', ...)` block
(`network-policy.test.ts:138-226`), Jest runs several `test.concurrent` cases together,
including `'create policy with allow_devbox_to_devbox=true'` at `:153`. That test body
creates a policy (`sdk.networkPolicy.create({...})` at `:156-161`), reads it via
`policy.getInfo()` (`:162`), asserts on egress fields (`:163-165`), and finally calls
`cleanUpPolicy(policy)` to delete it against the live API (`utils.ts:36-44`).

4. Because this test runs concurrently with sibling `test.concurrent` cases in the same
file (lines `139`, `171`, `191`, `207`) and with other smoketests that also perform
`sdk.networkPolicy.create` (e.g., `blueprint.test.ts:316-320`, `:356-360`,
`gateway-config.test.ts:339-343`, `:742-746`), the suite produces bursts of simultaneous
create/delete operations. Any transient backend error during this test's create or getInfo
call will surface as an unhandled rejection, failing this test in a non-deterministic way.
Changing this to a serial `test('create policy with allow_devbox_to_devbox=true', ...)`
reduces concurrent mutation pressure from this file and mitigates such flakiness.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/network-policy.test.ts
**Line:** 153:153
**Comment:**
	*Race Condition: This test creates and deletes a network policy, so making it concurrent adds unnecessary contention on backend policy writes and can fail intermittently in high-parallel CI. Run it serially to keep behavior deterministic.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('create policy with multiple allowed hostnames', async () => {
test.concurrent('create policy with multiple allowed hostnames', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This case issues a create plus multi-assert retrieval and cleanup; running it concurrently with other policy writers increases backend contention and can produce flaky CI outcomes. Use a normal test here. [race condition]

Severity Level: Major ⚠️
- ⚠️ Multi-hostname egress smoketest can flake under concurrent load.
- ⚠️ Flaky network policy tests degrade confidence in smoketest suite.
Suggested change
test.concurrent('create policy with multiple allowed hostnames', async () => {
test('create policy with multiple allowed hostnames', async () => {
Steps of Reproduction ✅
1. Run `yarn test:smoke` to execute all smoketests (`package.json:19-23`), which uses Jest
to run the entire `tests/smoketests` tree against the live API configured in
`tests/smoketests/utils.ts:17-23`.

2. Jest loads `tests/smoketests/object-oriented/network-policy.test.ts`, where `const sdk
= makeClientSDK();` (`:4`) creates a shared `RunloopSDK` instance that is reused across
describes, including `describe('network policy egress configurations', ...)` (`:138-226`).

3. The test `'create policy with multiple allowed hostnames'` at `:171` is marked
`test.concurrent`, so Jest schedules it in parallel with other concurrent tests in the
same describe (`:139`, `:153`, `:191`, `:207`). Its body (`:172-188`) constructs an array
of hostnames, calls `sdk.networkPolicy.create({...})` to create a live policy
(`:175-179`), reads the policy via `policy.getInfo()` (`:180`), asserts on the hostnames
(`:181-185`), and finally calls `cleanUpPolicy(policy)` which attempts a delete
(`utils.ts:36-44`).

4. At the same time, other smoketest files—such as `object-oriented/blueprint.test.ts`
(network policy creates at `:316-320` and `:356-360`) and
`object-oriented/gateway-config.test.ts` (network policy creates at `:339-343` and
`:742-746`)—also run under Jest's worker-level parallelism and issue
`sdk.networkPolicy.create` and deletes. This concentrates multiple live
create/getInfo/delete cycles into the same time window. Any transient backend issue on
those write paths will cause this test's promises to reject and thus intermittently fail
`test.concurrent('create policy with multiple allowed hostnames', ...)`. Executing it as a
normal `test(...)` lessens concurrent external-state mutations and improves determinism.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/network-policy.test.ts
**Line:** 171:171
**Comment:**
	*Race Condition: This case issues a create plus multi-assert retrieval and cleanup; running it concurrently with other policy writers increases backend contention and can produce flaky CI outcomes. Use a normal `test` here.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('create policy with empty allowed_hostnames (DENY_ALL)', async () => {
test.concurrent('create policy with empty allowed_hostnames (DENY_ALL)', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test still mutates server state (create/delete) even though it validates DENY_ALL behavior; running it concurrently increases the chance of transient throttling/timeouts in smoke CI. Keep it serial. [race condition]

Severity Level: Major ⚠️
- ⚠️ DENY_ALL network policy smoketest can exhibit flaky behavior.
- ⚠️ CI smoketest job may fail sporadically due to concurrency.
Suggested change
test.concurrent('create policy with empty allowed_hostnames (DENY_ALL)', async () => {
test('create policy with empty allowed_hostnames (DENY_ALL)', async () => {
Steps of Reproduction ✅
1. Invoke the smoketest suite using `yarn test:smoke` (`package.json:19-23`), which
configures `RUN_SMOKETESTS=1` and runs Jest across `tests/smoketests` with worker-level
parallelism.

2. In `tests/smoketests/object-oriented/network-policy.test.ts`, the `sdk` client is
created once at module load (`:1-4`) via `makeClientSDK()` (`utils.ts:17-23`), and used
throughout the `describe('network policy egress configurations', ...)` block (`:138-226`)
to call `sdk.networkPolicy.create` against the live API.

3. The test `'create policy with empty allowed_hostnames (DENY_ALL)'` at `:191` is
declared with `test.concurrent`, so Jest runs it in parallel with other concurrent tests
in this describe (`:139`, `:153`, `:171`, `:207`). Its implementation (`:192-204`) creates
a DENY_ALL policy with `allowed_hostnames: []` (`:194-198`), then calls `policy.getInfo()`
(`:199`) and asserts that the returned egress configuration matches expectations
(`:200-201`), finally invoking `cleanUpPolicy(policy)` which attempts a live delete
(`utils.ts:36-44`).

4. While this test executes, other smoketests concurrently perform network policy
create/delete cycles (e.g., `object-oriented/blueprint.test.ts:316-320` and `:356-360`,
`object-oriented/gateway-config.test.ts:339-343` and `:742-746`, plus sibling
`test.concurrent` cases in this file). The aggregate parallelism increases susceptibility
to transient backend failures on creation or retrieval; because this test does not catch
errors around `create` or `getInfo`, such failures will cause inconsistent,
hard-to-reproduce failures in the `'create policy with empty allowed_hostnames
(DENY_ALL)'` smoketest. Making this a serial `test(...)` reduces one live mutation stream
from the concurrent set and helps stabilize the DENY_ALL behavior verification.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/network-policy.test.ts
**Line:** 191:191
**Comment:**
	*Race Condition: This test still mutates server state (create/delete) even though it validates DENY_ALL behavior; running it concurrently increases the chance of transient throttling/timeouts in smoke CI. Keep it serial.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('create policy with wildcard hostnames', async () => {
test.concurrent('create policy with wildcard hostnames', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Wildcard-policy create/get/delete is another live-write flow; running this concurrently with other policy tests can cause intermittent lock/rate-limit failures. Switching back to serial execution makes this smoke check stable. [race condition]

Severity Level: Major ⚠️
- ⚠️ Wildcard-hostname egress smoketest may intermittently fail in CI.
- ⚠️ Increased concurrent policy mutations reduce determinism of smoketests.
Suggested change
test.concurrent('create policy with wildcard hostnames', async () => {
test('create policy with wildcard hostnames', async () => {
Steps of Reproduction ✅
1. Run `yarn test:smoke` from the repository root (`package.json:19-23`), which sets
`RUN_SMOKETESTS=1` and executes Jest across all smoketest files, including
`tests/smoketests/object-oriented/network-policy.test.ts`.

2. `network-policy.test.ts` initializes a shared `sdk` instance (`:1-4`) that talks to the
live API using `RunloopSDK` configured in `tests/smoketests/utils.ts:17-23`. Within
`describe('network policy egress configurations', ...)` (`:138-226`), all egress-related
tests rely on this shared client to perform real network policy operations.

3. The `'create policy with wildcard hostnames'` case at `:207` is defined as
`test.concurrent`, so Jest executes it alongside the other concurrent tests in this
describe (`:139`, `:153`, `:171`, `:191`). The body (`:208-224`) creates a network policy
with wildcard hostnames (`:211-215`), calls `policy.getInfo()` (`:217`), asserts on the
returned hostnames (`:218-221`), and then calls `cleanUpPolicy(policy)` which tries to
delete the policy from the backend (`utils.ts:36-44`).

4. In the same smoketest run, other files such as `object-oriented/blueprint.test.ts`
(`:316-320`, `:356-360`) and `object-oriented/gateway-config.test.ts` (`:339-343`,
`:742-746`) also create and delete network policies, and many of their tests are
themselves `test.concurrent` (e.g., `blueprint.test.ts:19-23`). This results in bursts of
concurrent create/getInfo/delete operations involving wildcard and non-wildcard policies.
Any transient backend contention or rate limiting on these operations will cause uncaught
promise rejections in this test, leading to intermittent failures of `'create policy with
wildcard hostnames'`. Converting it to a plain `test(...)` decreases per-file concurrency
and helps make smoketest behavior more predictable.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/network-policy.test.ts
**Line:** 207:207
**Comment:**
	*Race Condition: Wildcard-policy create/get/delete is another live-write flow; running this concurrently with other policy tests can cause intermittent lock/rate-limit failures. Switching back to serial execution makes this smoke check stable.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('execAsync with callbacks - real-time streaming', async () => {
test.concurrent('execAsync with callbacks - real-time streaming', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test has a strict 1-second assertion that streaming data must arrive before completion. Running it with test.concurrent on a shared devbox introduces scheduling/resource contention from sibling tests, which can delay first log delivery and cause flaky failures even when streaming works correctly. Make this test non-concurrent. [race condition]

Severity Level: Major ⚠️
- ⚠️ Streaming callbacks smoketest intermittently flakes under concurrent execution.
- ⚠️ `yarn test:objects` CI job may fail sporadically.
Suggested change
test.concurrent('execAsync with callbacks - real-time streaming', async () => {
test('execAsync with callbacks - real-time streaming', async () => {
Steps of Reproduction ✅
1. Run `yarn test:objects` from `package.json:23`, which invokes Jest with
`jest.config.objects.js` to execute `**/tests/smoketests/object-oriented/**/*.test.ts`
including `tests/smoketests/object-oriented/devbox.test.ts` (config lines 19–20).

2. In `devbox.test.ts:561-575`, Jest enters `describe('command execution with streaming
callbacks', ...)`; `beforeAll` at lines 564–569 creates a single `Devbox` instance reused
by all `test.concurrent` tests in this describe block.

3. The test `execAsync with callbacks - real-time streaming` at `devbox.test.ts:673-704`
calls `devbox.cmd.execAsync('echo "immediate" && sleep 2 && echo "delayed"', { stdout: ...
})` (lines 678–685), then waits exactly 1000 ms via `setTimeout` at line 688 before
asserting `receivedBeforeCompletion` is `true` at line 691 to prove logs arrived before
command completion.

4. Because this test is marked `test.concurrent` at line 673, Jest schedules it
simultaneously with other heavy streaming tests on the same devbox in the same describe
(e.g. `exec with long output - verify all lines received` at 750–774 and `concurrent
execAsync - multiple executions streaming simultaneously` at 776–840); under realistic
load this shared devbox and Jest worker concurrency (see `jest.config.objects.js:37-38`)
can delay the first `stdout` callback for "immediate" beyond 1 second even though
streaming works correctly, causing intermittent failures of the assertion at line 691.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/devbox.test.ts
**Line:** 673:673
**Comment:**
	*Race Condition: This test has a strict 1-second assertion that streaming data must arrive before completion. Running it with `test.concurrent` on a shared devbox introduces scheduling/resource contention from sibling tests, which can delay first log delivery and cause flaky failures even when streaming works correctly. Make this test non-concurrent.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

});

test('concurrent execAsync - multiple executions streaming simultaneously', async () => {
test.concurrent('concurrent execAsync - multiple executions streaming simultaneously', async () => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This test assumes both async executions produce logs within a fixed 1-second wait, but test.concurrent makes it run alongside many other command tests on the same devbox, so startup/log emission can be delayed and intermittently fail the task*Count > 0 checks. Run this test sequentially to avoid cross-test timing races. [race condition]

Severity Level: Major ⚠️
- ⚠️ Concurrent execAsync streaming smoketest unstable sharing one devbox.
- ⚠️ CI `yarn test:objects` reliability reduced by timing races.
Suggested change
test.concurrent('concurrent execAsync - multiple executions streaming simultaneously', async () => {
test('concurrent execAsync - multiple executions streaming simultaneously', async () => {
Steps of Reproduction ✅
1. Run `yarn test:objects` from `package.json:23`, which uses `jest.config.objects.js` to
run all object-oriented smoketests under `tests/smoketests/object-oriented` (config
`testMatch` at line 19 includes `devbox.test.ts`).

2. Jest executes `describe('command execution with streaming callbacks', ...)` in
`tests/smoketests/object-oriented/devbox.test.ts:561-840`; `beforeAll` at 564–569
provisions a single shared `Devbox` instance, while numerous `test.concurrent` cases in
this block start commands with sleeps and substantial output (e.g. tests at 577–774).

3. The test `concurrent execAsync - multiple executions streaming simultaneously` at
`devbox.test.ts:776-840` starts two executions via `devbox.cmd.execAsync(...)` (task A at
783–791 and task B at 793–801), records streamed lines in `taskALogs`/`taskBLogs`, then
after both executions have started (lines 803–805) waits a fixed 1000 ms at line 807
before asserting `taskACount > 0` and `taskBCount > 0` at lines 810–811.

4. With this test itself marked `test.concurrent` at line 776, Jest may schedule it
alongside other streaming tests in the same describe (e.g. the long-output test at
750–774) and with other files via `maxWorkers`/`maxConcurrency` in
`jest.config.objects.js:37-38`; when the shared devbox or log streaming is slightly slower
than 1 second under this load, neither `stdout` callback has fired by the 1000 ms check,
so `taskACount`/`taskBCount` remain zero at lines 810–811 even though both executions
later complete successfully, producing a flaky failure of this timing assumption.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** tests/smoketests/object-oriented/devbox.test.ts
**Line:** 776:776
**Comment:**
	*Race Condition: This test assumes both async executions produce logs within a fixed 1-second wait, but `test.concurrent` makes it run alongside many other command tests on the same devbox, so startup/log emission can be delayed and intermittently fail the `task*Count > 0` checks. Run this test sequentially to avoid cross-test timing races.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai
Copy link
Copy Markdown
Contributor

codeant-ai bot commented Apr 4, 2026

CodeAnt AI finished reviewing your PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant