feat: A2A platform — gateway, directory, routing, certification, observability by khaliqgant · Pull Request #104 · AgentWorkforce/relaycast

khaliqgant · 2026-03-24T18:13:38Z

Summary

Full A2A platform implementation for Relaycast, adding 27 new endpoints across 5 feature areas.

Depends on #103 (migrations must land first).

Gateway

Register/remove/list external A2A agents
Relay ↔ A2A JSON-RPC translation (DM intercept + webhook)
/.well-known/agent-card.json (auth header, query param, or path param)
Health checking (cron-triggered, 3-strike suspension)

Smart Routing

POST /v1/route — skill-based agent matching with configurable weights
Circuit breaker for failing agents
Route feedback loop (success/failure)

Certification

3-level test runner against external agent URLs
Badge SVG generation
Continuous monitoring

Observability Console

Message logging pipeline (hooked into send path)
Stats, agent metrics, cost breakdown APIs
Observer dashboard components (ConsoleFeed, AgentMetrics, CostBreakdown)

SDK

registerA2a(), listA2aAgents(), removeA2aAgent(), getA2aAgentCard()
route(), searchDirectory(), publishToDirectory(), importSkills()
getRoutingConfig(), updateRoutingConfig()

E2E

24 new tests covering all A2A endpoints (agent cards, directory CRUD, routing, certification, console)

Test plan

Schema tests pass (26/26)
SDK unit tests pass
Local server starts and endpoints respond correctly
Agent card resolves via auth header and query param
Full E2E suite (npm run e2e) against local server
npx turbo build passes

🤖 Generated with Claude Code

…rvability Implements the full A2A platform for Relaycast: **Gateway (Phase 1)** - A2A agent registration, removal, listing - Relay ↔ A2A JSON-RPC message translation - Webhook handling for external agent callbacks - Agent Card serving (/.well-known/agent-card.json) - Periodic health checking for external agents - DM intercept for transparent A2A routing **Observability (Phase 2)** - Message logging pipeline hooked into send path - Console API: messages, stats, agent stats, costs - Observer dashboard: ConsoleFeed, AgentMetrics, CostBreakdown **Directory (Phase 3)** - Publish, search (FTS5), browse, rate agents - Skill indexing for agents - CRUD routes for directory entries **Certification (Phase 4)** - 3-level certification test runner - Badge SVG generation - Continuous monitoring support **Smart Routing (Phase 5)** - Skill-based agent matching with configurable weights - Circuit breaker for failing agents - Route feedback (success/failure tracking) **SDK** - New methods: registerA2a, listA2aAgents, removeA2aAgent, getA2aAgentCard - route(), searchDirectory(), publishToDirectory(), importSkills() - getRoutingConfig(), updateRoutingConfig() **E2E** - 24 new E2E tests covering all A2A endpoints Depends on #103 for migrations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Update a2a-health test to expect origin-level agent card URL - Add getAgentByName mock for PATCH /v1/agents/:name test - Add directoryEngine mock for agent route tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Inline ternaries inside sql`` template literals caused Drizzle to render duplicate/truncated CTE blocks. Extract conditional fragments into variables before interpolating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ring bug Drizzle's sql`` template corrupts output when nested sql`` fragments are interpolated inside complex CTE queries. Split searchDirectory into two branches (FTS with CTEs vs tags-only) using only plain value interpolation, matching the safe pattern in routing.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…cation Drizzle's sql`` template on D1 duplicates the entire query text for complex CTE queries, even with plain value interpolation. Switch to db.$client.prepare() with positional bind params, matching the pattern used in a2a-health.ts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Revert db.$client.prepare() back to Drizzle sql`` with two query branches (FTS vs tags-only) using plain value interpolation. Need to investigate the actual D1 failure root cause before bypassing Drizzle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Append error.cause to the error message so the actual D1/SQLite error is visible in e2e test output instead of just "Failed query: <sql>". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…nctions in CTE context Split the CTE-based FTS queries in directory search and routing into separate direct queries where bm25() is called on the FTS table directly (which D1 supports), then merge the rank maps in JS. Error was: D1_ERROR: unable to use function bm25 in the requested context: SQLITE_ERROR Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…gate context D1 errors with "unable to use function bm25 in the requested context" when bm25() is wrapped in MIN() with GROUP BY. Fetch raw FTS rows and compute min rank per entity in JS instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…2e agent - D1 rejects bm25() in aggregate context (MIN + GROUP BY), so fetch raw FTS rows and compute min rank per entity in JS - Add sourceAgentName to routing e2e publish so the directory entry links to the registered agent (required by INNER JOIN in routing query) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-03-25T09:47:27Z

Preview deployed!

Environment	URL
API	https://pr104-api.relaycast.dev
Health	https://pr104-api.relaycast.dev/health
Observer	https://pr104-observer.relaycast.dev/observer

This preview shares the staging database and will be cleaned up when the PR is merged or closed.

Run E2E tests

npm run e2e -- https://pr104-api.relaycast.dev --ci

Open observer dashboard

https://pr104-observer.relaycast.dev/observer

- Remove unused imports (and, sql, asc, lt, gt, channels, messages, reactions, directoryAgents, directorySkills, DirectorySkillRow, toIso) - Prefix unused destructured vars with _ (url, workspace_id, channel_id, timestamp) - Change let → const for agentRankMap/skillRankMap - Replace no-explicit-any with proper types across routes, engine, and middleware files Co-Authored-By: codex-linter <noreply@anthropic.com> Co-Authored-By: claude-linter <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…e race The AgentDO webSocketClose handler was fire-and-forgetting the disconnect call to PresenceDO. An in-flight heartbeat could settle at PresenceDO after the disconnect, leaving the agent stuck as "online". Awaiting the disconnect ensures it completes before the handler returns. Also increased e2e disconnect polling from 10 to 15 attempts to give more time for the async DO close callback to settle. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…gent.test.ts Finding: The new mock (lines 14-16) stubs syncSourceAgentDirectoryEntry to always resolve, and the 'updates agent' test (lines 200-208) adds a getAgentByName mock return. These are correctness improvements but PR: #104 Auto-fixed by msd fix (claude)

- sendToExternalAgent: only retry 5xx/network errors, not 4xx or JSON-RPC errors; remove dead retry delay entry - setRoutingConfig: merge partial weights on top of current values instead of discarding them - findWebhookAgentByName: add workspace_id scope to prevent cross-tenant collisions; update webhook URL to include workspace_id in path - openapi.yaml: add servers override for root-level A2A endpoints - local daemon: route A2A well-known/rpc/webhook at root (not /v1/) Co-Authored-By: codex-a2a-fix <noreply@anthropic.com> Co-Authored-By: claude-spec-fix <noreply@anthropic.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lope - Update parity checker to treat /.well-known/agent*, /a2a/rpc, and /a2a/webhook* as root-level routes (no /v1 prefix) - Wrap /v1/a2a/agents/:name/card response in standard { ok, data } envelope - Switch SDK from getRaw to standard get for agent card endpoint - Remove unused requestRaw/getRaw methods from SDK client Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Track pending auto heartbeats in SDK and await them during disconnect to prevent stale heartbeats from re-creating agent presence after the authoritative HTTP disconnect - Only retry errors explicitly marked retryable (5xx/network), not ZodErrors or JSON-RPC parse failures - Remove non-null assertion in webhook handler; guard against agent deletion mid-request - Add A2A gateway and directory endpoints to README.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The a2a_register stub was formatting the webhook URL as /a2a/webhook/{name} but the route expects /a2a/webhook/{workspace_id}/{name}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devin-ai-integration

Devin Review found 1 new potential issue.

View 23 additional findings in Devin Review.

devin-ai-integration · 2026-03-25T11:34:00Z

packages/server/src/engine/dm.ts

+    await a2aEngine.sendToExternalAgent(a2aTarget.external_url, payload, {
+      scheme: a2aTarget.auth_scheme,
+      credential: a2aTarget.auth_credential,
+    });
+    await a2aEngine.incrementA2aMessagesSent(db, a2aTarget.id);


🔴 DM message is permanently lost when A2A external agent delivery fails

In sendDm, the A2A outbound call at line 274 (sendToExternalAgent) is awaited before the message is persisted to the database at line 281 (persistDmMessage). If the external agent returns a 4xx, exhausts 5xx retries, returns a JSON-RPC error, or is unreachable, sendToExternalAgent throws and the entire sendDm function throws — the DM is never written to the messages table. From the caller's perspective the message simply vanishes. This breaks the Relaycast invariant that sent DMs are always persisted in the conversation history, and it makes A2A agent reliability failures silently destructive to user data.

Prompt for agents

In packages/server/src/engine/dm.ts, inside the sendDm function, move the persistDmMessage call (currently at line 281) to BEFORE the sendToExternalAgent call (currently at line 274). The message must be written to the database regardless of whether the external A2A agent delivery succeeds or fails. After persisting, the A2A send can be attempted, and its failure can be logged or surfaced without losing the message. Alternatively, wrap the sendToExternalAgent call in a try/catch so that a delivery failure does not prevent persistDmMessage from executing, and attach an error status to the console log entry instead.

Was this helpful? React with 👍 or 👎 to provide feedback.

This comment was marked as resolved.

Sign in to view

khaliqgant force-pushed the a2a-implementation branch from a1cde03 to 228c25e Compare March 25, 2026 07:59

This comment was marked as resolved.

Sign in to view

khaliqgant and others added 6 commits March 25, 2026 09:51

fix: surface D1 error cause in directory search error response

58e1380

Append error.cause to the error message so the actual D1/SQLite error is visible in e2e test output instead of just "Failed query: <sql>". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

khaliqgant and others added 4 commits March 25, 2026 10:54

This comment was marked as resolved.

Sign in to view

fix: include workspace_id in local daemon A2A webhook URL

77fd28e

The a2a_register stub was formatting the webhook URL as /a2a/webhook/{name} but the route expects /a2a/webhook/{workspace_id}/{name}. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

khaliqgant merged commit fc375dc into main Mar 25, 2026
4 checks passed

khaliqgant deleted the a2a-implementation branch March 25, 2026 11:32

devin-ai-integration bot reviewed Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: A2A platform — gateway, directory, routing, certification, observability#104

feat: A2A platform — gateway, directory, routing, certification, observability#104
khaliqgant merged 17 commits intomainfrom
a2a-implementation

khaliqgant commented Mar 24, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

khaliqgant commented Mar 24, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Gateway

Directory

Smart Routing

Certification

Observability Console

SDK

E2E

Test plan

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

github-actions bot commented Mar 25, 2026

Run E2E tests

Open observer dashboard

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

khaliqgant commented Mar 24, 2026 •

edited by devin-ai-integration bot

Loading