Skip to content

fix: eliminate presence race by routing disconnect through AgentDO#106

Merged
khaliqgant merged 1 commit intomainfrom
fix/presence-disconnect-race
Mar 25, 2026
Merged

fix: eliminate presence race by routing disconnect through AgentDO#106
khaliqgant merged 1 commit intomainfrom
fix/presence-disconnect-race

Conversation

@khaliqgant
Copy link
Member

@khaliqgant khaliqgant commented Mar 25, 2026

Summary

  • The HTTP disconnect route (POST /v1/agents/disconnect) now sends a force-disconnect to the AgentDO before hitting PresenceDO directly
  • AgentDO's /force-disconnect handler closes all WebSockets and sends the authoritative disconnect to PresenceDO
  • Because the DO runtime serializes all handlers, this guarantees the disconnect runs after any in-flight WS ping heartbeat, preventing a stale heartbeat from re-creating the agent's presence entry
  • The direct PresenceDO disconnect still follows as a safety net

Problem

When an agent disconnects, a WS ping heartbeat already in-flight on the AgentDO could arrive at PresenceDO after the HTTP disconnect deleted the agent entry, re-creating it. With no subsequent disconnect to clean it up, the agent would appear stuck "online" indefinitely.

Test plan

  • Deploy preview passes e2e smoke test (presence lifecycle section)
  • markOffline transitions agent to offline no longer flakes
  • markOnline brings agent back online still works after reconnect

🤖 Generated with Claude Code


Open with Devin

The HTTP disconnect route now sends a force-disconnect to AgentDO
before hitting PresenceDO directly. Because the DO runtime serializes
all handlers (fetch, webSocketMessage, webSocketClose), this guarantees
the disconnect runs AFTER any in-flight WS ping heartbeat that would
otherwise re-create the agent's presence entry.

AgentDO /force-disconnect:
1. Closes all WebSockets (stops further pings)
2. Sends authoritative disconnect to PresenceDO

The direct PresenceDO disconnect still follows as a safety net for
cases where the AgentDO has no stored meta.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link

Preview deployed!

Environment URL
API https://pr106-api.relaycast.dev
Health https://pr106-api.relaycast.dev/health
Observer https://pr106-observer.relaycast.dev/observer

This preview shares the staging database and will be cleaned up when the PR is merged or closed.

Run E2E tests

npm run e2e -- https://pr106-api.relaycast.dev --ci

Open observer dashboard

https://pr106-observer.relaycast.dev/observer

@khaliqgant khaliqgant merged commit a180ea7 into main Mar 25, 2026
4 checks passed
@khaliqgant khaliqgant deleted the fix/presence-disconnect-race branch March 25, 2026 12:25
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant