Skip to content

Fix stalled resumes: wake after send + fail-closed MCP bridge#141

Open
leeroybrun wants to merge 1 commit intodevfrom
fix/session-wake-and-mcp
Open

Fix stalled resumes: wake after send + fail-closed MCP bridge#141
leeroybrun wants to merge 1 commit intodevfrom
fix/session-wake-and-mcp

Conversation

@leeroybrun
Copy link
Collaborator

@leeroybrun leeroybrun commented Mar 25, 2026

  • UI: After a direct message submit in an active session, do a best-effort wake (resume) so sessions recover if the runner died mid-session.
  • CLI: resolveNodeBackedMcpServerCommand now fails closed instead of returning a non-existent packaged entrypoint; prevents Claude Code from crashing later with MODULE_NOT_FOUND when the Happier MCP bridge path is missing.
  • CLI: fix missing readFile import in executeWorkspaceReplicationJobWithLocalRuntime test.

Repro evidence:

  • Claude Code debug logs showed MCP server "happier" failing to start due to missing package-dist bridge, leading to Claude Code exiting code 1 and sessions appearing to stall.

Tests:

  • apps/ui: SessionView.sendMessage.resumeInactive.pendingQueue.test.tsx
  • apps/cli: resolveNodeBackedMcpServerCommand.test.ts, createHappierMcpBridge.test.ts

Summary by CodeRabbit

  • Bug Fixes

    • Fixed a critical issue where command resolution would proceed with invalid entrypoints instead of properly reporting the error.
  • New Features

    • Active sessions now automatically attempt to resume the pending queue after successfully submitting a message.
  • Tests

    • Enhanced test coverage for entrypoint validation edge cases and session message sending workflows.

@coderabbitai
Copy link

coderabbitai bot commented Mar 25, 2026

Walkthrough

Changes enhance error handling for missing MCP server entrypoints and implement pending-queue wake/resume functionality when submitting messages in active sessions. Updates include new test cases, mocking improvements, and runtime logic adjustments across test and implementation files.

Changes

Cohort / File(s) Summary
MCP Server Command Resolution
apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.ts, apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.test.ts
Modified error-handling path: now throws an error when neither packaged nor source entrypoints are available, instead of silently using missing packaged entrypoint. Added new test case verifying error rejection when no usable entrypoint exists.
Happier MCP Bridge Testing
apps/cli/src/agent/runtime/createHappierMcpBridge.test.ts
Added targeted existsSync mocks to three test scenarios ("direct script mode", "current-process mode", "bun uses ensured JavaScript runtime") to explicitly simulate bundled dist bridge presence and exercise package-dist bridge code paths.
Session Message Pending Queue
apps/ui/sources/components/sessions/shell/SessionView.tsx, apps/ui/sources/components/sessions/shell/SessionView.sendMessage.resumeInactive.pendingQueue.test.tsx
Implemented pending-queue wake/resume after message submission: computes wake options via getPendingQueueWakeResumeOptions() and asynchronously triggers resumeSession() if available. Test updated with spy-backed mocks, mutable session state, and new test case verifying submitMessage and resumeSession calls when session is active.
Workspace Replication Testing
apps/cli/src/workspaces/replication/orchestration/executeWorkspaceReplicationJobWithLocalRuntime.test.ts
Added readFile import from node:fs/promises alongside existing filesystem imports.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title directly summarizes the main changes: waking sessions after message send and making MCP bridge fail closed instead of returning non-existent paths.
Description check ✅ Passed The description covers the key changes and includes repro evidence and test information, though some template sections like 'How to test' with steps and checklist items are missing or incomplete.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/session-wake-and-mcp

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR fixes two independent reliability bugs: (1) active sessions that stall after the runner dies mid-session are now recovered by issuing a best-effort resumeSession wake call immediately after a successful submitMessage in the UI; (2) the CLI's resolveNodeBackedMcpServerCommand now throws a descriptive error when neither the packaged nor the source entrypoint is present, preventing Claude Code from later crashing with an opaque MODULE_NOT_FOUND. A missing readFile import in one test file is also corrected.

  • SessionView.tsx: After a successful submitMessage in the active-session path, getPendingQueueWakeResumeOptions is consulted and, if it returns options, resumeSession is called in a nested fire-and-forget. The wake is fully non-fatal — any error (including a 'error' typed result) is swallowed because the message is already persisted in the transcript.
  • resolveNodeBackedMcpServerCommand.ts: The old fallback — silently returning { command, args: [...packagedEntrypoint] } even when existsSync(packagedEntrypoint) was false — is replaced with a throw, closing the hole that caused the MODULE_NOT_FOUND crash.
  • createHappierMcpBridge.test.ts: Three existing tests that relied on the old permissive fallback now each mock existsSync to return true for the packaged bridge path, keeping them green.
  • resolveNodeBackedMcpServerCommand.test.ts: New test covers the throw path when all entrypoints are absent.
  • SessionView.sendMessage.resumeInactive.pendingQueue.test.tsx: New test verifies the two-step fire-and-forget (submit → wake) by asserting both submitMessageSpy and resumeSessionSpy are called in sequence for an active session.
  • Mixed indentation: The new block in SessionView.tsx (lines 1746–1783) was committed with a leading tab character on each line, inconsistent with the file's space-only style.

Confidence Score: 5/5

  • Safe to merge; the only issue is cosmetic mixed indentation that does not affect runtime behaviour.
  • Both production changes are correct and fail safely: the wake is fire-and-forget with full error suppression, and the CLI now throws a clear error instead of passing a phantom path to the OS. All new behaviour is covered by new tests, and existing tests were updated to reflect the stricter entrypoint resolution. The sole finding is a tab/space mixing artefact introduced by the editor that committed the SessionView.tsx changes — P2 style only, no logic impact.
  • apps/ui/sources/components/sessions/shell/SessionView.tsx — tab characters mixed into an otherwise space-indented block (lines 1746–1783).

Important Files Changed

Filename Overview
apps/ui/sources/components/sessions/shell/SessionView.tsx Adds a best-effort "wake" call after a successful submitMessage in active sessions. Logic is sound and failure is properly non-fatal, but the new block introduces mixed tab/space indentation.
apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.ts Replaces the silent fallback (returning a non-existent packaged path) with a descriptive thrown error, making missing entrypoints fail closed. Clean and correct change.
apps/ui/sources/components/sessions/shell/SessionView.sendMessage.resumeInactive.pendingQueue.test.tsx Adds a new test case for the active-session wake flow; introduces submitMessageSpy and a mutable sessionState to properly verify the two-step fire-and-forget behavior. Test design is solid.
apps/cli/src/agent/runtime/createHappierMcpBridge.test.ts Adds existsSync mocks to three tests that previously relied on the old permissive fallback; necessary follow-up to the fail-closed change in resolveNodeBackedMcpServerCommand.
apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.test.ts Adds a test asserting the new throw path when both the packaged and source entrypoints are absent. Matches the production change exactly.
apps/cli/src/workspaces/replication/orchestration/executeWorkspaceReplicationJobWithLocalRuntime.test.ts One-line fix: adds missing readFile to the node:fs/promises import. Trivially correct.

Sequence Diagram

sequenceDiagram
    participant User
    participant SessionView
    participant sync
    participant fireAndForget
    participant resumeSession

    User->>SessionView: onSend() [active session]
    SessionView->>fireAndForget: fireAndForget(submitMessage flow)
    fireAndForget->>sync: submitMessage(sessionId, text, ...)
    sync-->>fireAndForget: ok
    fireAndForget->>SessionView: getPendingQueueWakeResumeOptions(...)
    alt wakeOpts returned (session not thinking / no active requests)
        SessionView->>fireAndForget: fireAndForget(wake flow)
        fireAndForget->>resumeSession: resumeSession({ ...wakeOpts, serverId })
        resumeSession-->>fireAndForget: result (success or error, both non-fatal)
    else wakeOpts is null (session busy or no target)
        SessionView-->>SessionView: skip wake silently
    end
Loading

Reviews (1): Last reviewed commit: "fix: wake active sends + fail-closed MCP..." | Re-trigger Greptile

Comment on lines +1746 to +1783
try {
await sync.submitMessage(sessionId, outbound.text, outbound.displayText, outbound.metaOverrides);
if (shouldSendReviewComments) {
storage.getState().clearSessionReviewCommentDrafts(sessionId);
}

const wakeOpts = getPendingQueueWakeResumeOptions({
sessionId,
session,
resumeCapabilityOptions,
resumeTargetOverride: reachableMachineTarget
? {
machineId: reachableMachineTarget.machineId,
directory: reachableMachineTarget.basePath,
}
: null,
permissionOverride: getPermissionModeOverrideForSpawn(session),
});
if (wakeOpts) {
fireAndForget((async () => {
try {
const result = await resumeSession({
...wakeOpts,
serverId: capabilityServerId,
});
if (result.type === 'error') {
// Non-fatal: message is already persisted in the transcript.
}
} catch {
// Non-fatal: message is already persisted in the transcript.
}
})(), { tag: 'SessionView.sendMessage.wakeActive' });
}
} catch (e) {
setMessage(previousMessage);
Modal.alert(t('common.error'), e instanceof Error ? e.message : t('errors.failedToSendMessage'));
}
})(), { tag: 'SessionView.sendMessage.submitMessage' });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Mixed tab/space indentation introduced

The new block (lines 1746–1783) is indented with a leading tab character (\t) followed by spaces, while every surrounding line in the file uses spaces exclusively. This creates inconsistent indentation that can cause visual diffs to appear garbled in editors and violates the file's existing style.

The affected lines should be re-indented to match the 28-space indent level used by the surrounding code:

                            try {
                                await sync.submitMessage(sessionId, outbound.text, outbound.displayText, outbound.metaOverrides);
                                if (shouldSendReviewComments) {
                                    storage.getState().clearSessionReviewCommentDrafts(sessionId);
                                }

                                const wakeOpts = getPendingQueueWakeResumeOptions({
                                    sessionId,
                                    session,
                                    resumeCapabilityOptions,
                                    resumeTargetOverride: reachableMachineTarget
                                        ? {
                                            machineId: reachableMachineTarget.machineId,
                                            directory: reachableMachineTarget.basePath,
                                        }
                                        : null,
                                    permissionOverride: getPermissionModeOverrideForSpawn(session),
                                });
                                if (wakeOpts) {
                                    fireAndForget((async () => {
                                        try {
                                            const result = await resumeSession({
                                                ...wakeOpts,
                                                serverId: capabilityServerId,
                                            });
                                            if (result.type === 'error') {
                                                // Non-fatal: message is already persisted in the transcript.
                                            }
                                        } catch {
                                            // Non-fatal: message is already persisted in the transcript.
                                        }
                                    })(), { tag: 'SessionView.sendMessage.wakeActive' });
                                }
                            } catch (e) {
                                setMessage(previousMessage);
                                Modal.alert(t('common.error'), e instanceof Error ? e.message : t('errors.failedToSendMessage'));
                            }
                        })(), { tag: 'SessionView.sendMessage.submitMessage' });

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
apps/cli/src/agent/runtime/createHappierMcpBridge.test.ts (1)

43-46: Consider extracting the repeated existsSync stub to a helper.

The same mock implementation appears in three tests. A small helper would reduce duplication:

const stubDistBridgeExists = () => {
  vi.mocked(existsSync).mockImplementation((pathLike) => {
    const path = String(pathLike)
    return path.endsWith('/package-dist/backends/codex/happyMcpStdioBridge.mjs')
  })
}

Then each test simply calls stubDistBridgeExists().

Also applies to: 64-67, 148-151

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/cli/src/agent/runtime/createHappierMcpBridge.test.ts` around lines 43 -
46, The tests repeat the same existsSync mock implementation; extract that stub
into a helper function (e.g., stubDistBridgeExists) and replace the repeated
vi.mocked(existsSync).mockImplementation(...) occurrences with a single call to
that helper in each test; locate the mocks by searching for usages of existsSync
in createHappierMcpBridge.test.ts and replace the three duplicate blocks (the
ones checking
path.endsWith('/package-dist/backends/codex/happyMcpStdioBridge.mjs')) with
calls to the new helper to remove duplication.
apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.ts (1)

58-62: Delay JS runtime resolution until after an entrypoint is chosen.

Because Line 22 resolves requireJavaScriptRuntimeExecutable() before any existsSync checks, this new error is still unreachable whenever runtime discovery fails first. Lines 10-17 in apps/cli/src/agent/runtime/createHappierMcpBridge.ts let that exception bubble straight up, so a broken install can still surface HAPPIER_JS_RUNTIME_PATH instead of the clearer missing-entrypoint failure. Pick the usable entrypoint first, then resolve the runtime for that path.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.ts` around lines
58 - 62, The code currently calls requireJavaScriptRuntimeExecutable() before
checking which entrypoint exists, so runtime-resolution errors can mask a
missing-entrypoint error; update resolveNodeBackedMcpServerCommand to first
determine the usable entrypoint (check packagedEntrypoint and sourceEntrypoint
with existsSync and tsxHookPath availability) and only after selecting the
chosen entrypoint call requireJavaScriptRuntimeExecutable() for that path;
ensure any thrown errors from requireJavaScriptRuntimeExecutable() occur after
the entrypoint-not-found Error (the existing throw referencing
packagedEntrypoint/sourceEntrypoint/tsxHookPath) so missing-entrypoint remains
the surfaced failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/cli/src/agent/runtime/createHappierMcpBridge.test.ts`:
- Around line 43-46: The tests repeat the same existsSync mock implementation;
extract that stub into a helper function (e.g., stubDistBridgeExists) and
replace the repeated vi.mocked(existsSync).mockImplementation(...) occurrences
with a single call to that helper in each test; locate the mocks by searching
for usages of existsSync in createHappierMcpBridge.test.ts and replace the three
duplicate blocks (the ones checking
path.endsWith('/package-dist/backends/codex/happyMcpStdioBridge.mjs')) with
calls to the new helper to remove duplication.

In `@apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.ts`:
- Around line 58-62: The code currently calls
requireJavaScriptRuntimeExecutable() before checking which entrypoint exists, so
runtime-resolution errors can mask a missing-entrypoint error; update
resolveNodeBackedMcpServerCommand to first determine the usable entrypoint
(check packagedEntrypoint and sourceEntrypoint with existsSync and tsxHookPath
availability) and only after selecting the chosen entrypoint call
requireJavaScriptRuntimeExecutable() for that path; ensure any thrown errors
from requireJavaScriptRuntimeExecutable() occur after the entrypoint-not-found
Error (the existing throw referencing
packagedEntrypoint/sourceEntrypoint/tsxHookPath) so missing-entrypoint remains
the surfaced failure.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 97190716-d3c2-4c7f-9b91-f6d07b1ac44d

📥 Commits

Reviewing files that changed from the base of the PR and between 67169fb and a232cbb.

📒 Files selected for processing (6)
  • apps/cli/src/agent/runtime/createHappierMcpBridge.test.ts
  • apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.test.ts
  • apps/cli/src/mcp/runtime/resolveNodeBackedMcpServerCommand.ts
  • apps/cli/src/workspaces/replication/orchestration/executeWorkspaceReplicationJobWithLocalRuntime.test.ts
  • apps/ui/sources/components/sessions/shell/SessionView.sendMessage.resumeInactive.pendingQueue.test.tsx
  • apps/ui/sources/components/sessions/shell/SessionView.tsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant