RFC 001: A2A Platform — Gateway, Directory, Console, Certification, Smart Routing#87
RFC 001: A2A Platform — Gateway, Directory, Console, Certification, Smart Routing#87khaliqgant wants to merge 7 commits intomainfrom
Conversation
…ion, Smart Routing) Comprehensive spec for positioning Relaycast as the managed A2A platform: - A2A Gateway: bridge external A2A agents into Relay workspaces - Agent Directory: public searchable marketplace for A2A agents - Observability Console: message logs, flow viz, cost dashboard - Compliance Certification: 3-level test suite with badges - Smart Routing: skill-based routing with scoring algorithm - Pricing model and implementation phases
|
Preview deployed!
This preview shares the staging database and will be cleaned up when the PR is merged or closed. Run E2E testsnpm run e2e -- https://pr87-api.relaycast.dev --ciOpen observer dashboard |
16 findings across convention violations, protocol alignment, architecture gaps, and strategic observations. Key blockers: response envelope missing data wrapper, camelCase wire fields, wrong Agent Card path, outdated JSON-RPC method names. https://claude.ai/code/session_01E8sS9D4fwrggC2QcCJyv39
…g, SSE bridge, A2H
Fixes from Devin + Claude review:
- Response envelope: wrap in { ok, data: {...} }
- DirectoryEntry: camelCase → snake_case for wire format
- Agent Card path: agent.json → agent-card.json (A2A standard)
- JSON-RPC methods: message/send → SendMessage (A2A v0.3+)
- DB schema: PostgreSQL → Drizzle/D1 SQLite (matches codebase)
- Search: PostgreSQL FTS → D1 FTS5 (already in use)
- Remove GET webhook endpoint (POST only)
New sections:
- A2A Task state mapping (WORKING, INPUT_REQUIRED, etc)
- Context preservation (contextId ↔ Relaycast threads)
- SSE ↔ WebSocket streaming bridge
- A2H (Agent-to-Human) bridging via identity_type: human
- Auto-certification on registration (Level 1)
- Rust SDK in architecture diagram
| ``` | ||
|
|
||
| After registration: | ||
| - The external agent appears in `relay.list_agents()` with its A2A skills |
There was a problem hiding this comment.
🟡 Spec uses snake_case SDK method name relay.list_agents() violating AGENTS.md camelCase convention
Line 84 references relay.list_agents() and line 246 references list_agents() as generic SDK method calls. Per the mandatory AGENTS.md rule "JS/TS method and function names are camelCase," a snake_case method name is a convention violation. Both the TypeScript SDK (relay.agents.list() at packages/sdk-typescript/src/relay.ts:311) and the Python SDK (relay.agents.list() at packages/sdk-python/src/relay_sdk/relay.py:61) use the namespaced relay.agents.list() pattern. The spec correctly differentiates TS/Python naming elsewhere at line 573 (TS: relay.listA2aAgents() / Python: relay.list_a2a_agents()) but doesn't do so here. While list_agents exists in the Rust SDK, the spec text doesn't indicate it's Rust-specific, and the unqualified use of a snake_case method name violates the camelCase rule for JS/TS.
Was this helpful? React with 👍 or 👎 to provide feedback.
Strategic revision of PR #87 incorporating A2A ecosystem research: - Unify registry + naming service + gateway as Relaycast features - Fix 4 protocol blockers (agent-card path, method names, envelope, snake_case) - Connect Smart Routing to A2A AgentCard.skills (first production ANS) - Add competitive landscape (agentgateway, Solo.io, IBM, Twilio) - Add A2H bridging via INPUT_REQUIRED → human agent routing - Fix database schemas to D1/SQLite + Drizzle ORM - 7 additional fixes (SSE streaming, circuit breaker, auto-certification) https://claude.ai/code/session_01E8sS9D4fwrggC2QcCJyv39
…, agent discovery - Reframe 5 capabilities with unified SDK principle (registry + naming + gateway as features) - Rename Section 4: Agent Directory → Agent Registry (fills Discussion #741 gap) - Rename Section 7: Smart Routing → Smart Routing & Agent Discovery (first production ANS) - Add Section 10: full competitive landscape (agentgateway, Solo.io, IBM ContextForge, OWASP ANS, Twilio A2H) - Add skills search API endpoints (GET /v1/skills, GET /v1/skills/search) - Add unified developer experience example in Section 1 - Add cross-framework demo + Solo.io + OWASP refs - Connect Smart Routing to A2A Agent Card skills indexing at registration time
AgentSkills (agentskills.io) is the open format for agent capabilities, adopted by Cursor, Claude Code, Codex, Gemini CLI, VS Code, and 25+ tools. - Add AgentSkills as the skill definition layer (Section 7.2.1) - Map AgentSkills fields → A2A Agent Card fields → Relaycast index - Add import_skills() API for SKILL.md ingestion - Add to competitive landscape as complementary standard - Update Section 1 developer experience with AgentSkills integration - Three standards unified: AgentSkills → A2A → Relaycast routing
| agent = relay.register_agent("billing-expert", skills=[ | ||
| {"id": "refunds", "name": "Process Refunds", | ||
| "description": "Handle customer refund requests for Stripe payments", | ||
| "tags": ["billing", "stripe", "refunds"], | ||
| "examples": ["Process refund for order #1042", "Issue partial refund of $50"]}, | ||
| {"id": "invoices", "name": "Generate Invoices", | ||
| "description": "Create and send PDF invoices from order data", | ||
| "tags": ["billing", "pdf", "invoices"]} | ||
| ]) |
There was a problem hiding this comment.
🔴 Duplicated guidance: skills registration example repeated three times
AGENTS.md rule: "Keep docs concise and avoid duplicated guidance." The relay.register_agent("billing-expert", skills=[...]) pattern is repeated nearly identically across three code blocks: lines 31-38 (Section 1 Developer Experience), lines 554-562 (Section 7.2 Smart Routing), and lines 585-593 (Section 7.2.1 AgentSkills Integration). Additionally, the AgentSkills adoption boilerplate ("adopted by Cursor, Claude Code, Codex, Gemini CLI, VS Code Copilot, and 25+ tools") is restated at lines 29-30, 543, and 753. The "three standards unified" explanation also appears at line 54 and again at lines 545-548. Each subsequent mention should reference the first rather than re-explain.
Prompt for agents
In specs/001-a2a-platform.md, consolidate duplicated content:
1. Lines 554-562 (Section 7.2) repeat the same relay.register_agent("billing-expert", skills=[...]) code block from lines 31-38 (Section 1). Instead of a full code block, reference the earlier example: e.g. "Using the same registration API shown in Section 1, skills are indexed at registration time for fast matching" and only show the NEW relay.route() examples that Section 7.2 introduces.
2. Lines 577-593 (Section 7.2.1) again show relay.import_skills() (already at line 41) and another register_agent variant. Keep only the new agent_skills=[] keyword argument example here, and cross-reference the earlier import_skills example.
3. The AgentSkills adoption description ("adopted by Cursor, Claude Code, Codex...") appears at lines 29-30, 543, and 753. Define it once in Section 7.2 and reference it from the other locations.
4. The "three standards unified" explanation at line 54 and lines 545-548 should appear once (Section 7.2) with a forward/back reference from Section 1.
Was this helpful? React with 👍 or 👎 to provide feedback.
| 1. Filter agents with matching skill | ||
| 2. Exclude: offline, suspended, over-capacity | ||
| 3. Score remaining: | ||
| - availability_score = 1.0 if online, 0.0 if offline |
There was a problem hiding this comment.
🟡 Routing algorithm: availability_score is always 1.0 after step 2 filters offline agents
Step 2 of the routing algorithm at specs/001-a2a-platform.md:610 says "Exclude: offline, suspended, over-capacity." Step 3 at line 612 then defines availability_score = 1.0 if online, 0.0 if offline. Since all offline agents were already excluded in step 2, availability_score is always 1.0 for every scored agent. This makes the availability weight (default 0.3 — the highest weight at specs/001-a2a-platform.md:632) contribute a constant offset rather than differentiating agents. If implemented as-is, 30% of the scoring formula would be dead weight. The spec should either remove the availability_score from step 3 (and redistribute its weight), or redefine it to capture a graduated metric like recent uptime percentage so it actually varies across agents.
| - availability_score = 1.0 if online, 0.0 if offline | |
| - availability_score = uptime_percent / 100.0 (0.0 to 1.0, based on recent uptime) |
Was this helpful? React with 👍 or 👎 to provide feedback.
tsx (and other CJS-in-ESM resolvers) fails with ERR_PACKAGE_PATH_NOT_EXPORTED when only 'import' is specified in exports map. Adding 'default' and 'require' entries fixes interop with tsx, ts-node, and mixed CJS/ESM consumers.
This reverts commit 07706b9.
| |---|---|---| | ||
| | `SUBMITTED` | Message received | Initial DM or channel post | | ||
| | `WORKING` | Agent processing | Agent has read the message | | ||
| | `INPUT_REQUIRED` | Human-in-the-loop | Route to `identity_type: human` agent in workspace | |
There was a problem hiding this comment.
🟡 Spec references non-existent identity_type field — actual codebase field is type
The spec refers to identity_type: human in three places (lines 220, 264, 269) when describing how the gateway looks up human agents. However, the actual agent schema uses the field type (not identity_type), as seen in packages/server/src/db/schema.ts:36 (type: text('type').notNull().default('agent')) and packages/types/src/agent.ts:12 (type: AgentTypeSchema). The valid values are 'agent' | 'human' | 'system'. If someone implements this spec, they'll look for identity_type and not find it in the existing data model. Should be type: 'human' (wire) or type: human throughout.
| | `INPUT_REQUIRED` | Human-in-the-loop | Route to `identity_type: human` agent in workspace | | |
| | `INPUT_REQUIRED` | Human-in-the-loop | Route to `type: human` agent in workspace | |
Was this helpful? React with 👍 or 👎 to provide feedback.
A2A Platform spec. See specs/001-a2a-platform.md