Skip to content

fix: add WebSocket heartbeat to prevent code=1006 disconnections#709

Open
livepeer-tessa wants to merge 1 commit intomainfrom
fix/cloud-websocket-heartbeat
Open

fix: add WebSocket heartbeat to prevent code=1006 disconnections#709
livepeer-tessa wants to merge 1 commit intomainfrom
fix/cloud-websocket-heartbeat

Conversation

@livepeer-tessa
Copy link
Contributor

Problem

Closes #707

Users experience abrupt cloud WebSocket disconnections (code=1006, reason=None) after 10-30+ minutes of use. Code 1006 means "abnormal closure" — the TCP connection was dropped without a WebSocket close frame.

Looking at the log from #707:

  • Connected at 15:50:49
  • ~15 min idle while setting up
  • Krea pipeline started at 16:05:57
  • WebSocket dropped at 16:15:59 (~10 min into active use)

Root Cause

aiohttp.ws_connect has no heartbeat set, so it never sends WebSocket ping frames. NAT gateways, proxies, and firewalls commonly drop TCP connections that appear idle (no traffic for some period). When the connection is dropped silently at the TCP level, there's no WebSocket close frame — hence code=1006 with reason=None.

This is a well-known aiohttp pattern: without heartbeat, long-running WebSocket connections die through middleboxes.

Fix

Add heartbeat=30.0 to ws_connect. This causes aiohttp to send a WebSocket ping frame every 30 seconds, which:

  1. Keeps the TCP connection alive through NAT/proxy/firewall idle timeouts
  2. Detects dead connections within ~30s (pong timeout) rather than hanging silently

Change

- self._session.ws_connect(ws_url),
+ self._session.ws_connect(ws_url, heartbeat=30.0),

One line. The heartbeat parameter is built into aiohttp and has no dependencies.

Without a heartbeat, aiohttp does not send WebSocket ping frames, so
NAT gateways, proxies, and firewalls can silently drop idle TCP
connections. This manifests as code=1006 (abnormal closure / no close
frame) after 10-30 minutes of use.

Set heartbeat=30.0 on ws_connect so aiohttp sends a ping frame every
30 seconds, keeping the connection alive through middleboxes.

Fixes #707

Signed-off-by: livepeer-robot <robot@livepeer.org>
@coderabbitai
Copy link

coderabbitai bot commented Mar 17, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 66d5ed2a-fb6e-468a-8049-42e426b78d28

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/cloud-websocket-heartbeat
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can scan for known vulnerabilities in your dependencies using OSV Scanner.

OSV Scanner will automatically detect and report security vulnerabilities in your project's dependencies. No additional configuration is required.

@github-actions
Copy link
Contributor

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-709--preview
WebSocket wss://fal.run/daydream/scope-pr-709--preview/ws
Commit ce0a7e7

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-709--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

@github-actions
Copy link
Contributor

✅ E2E Tests passed

Status passed
fal App daydream/scope-pr-709--preview
Run View logs

Test Artifacts

Check the workflow run for screenshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cloud WebSocket abruptly close while using Scope

1 participant