fix(coordinator): persist agent results immediately after read_agent#684
fix(coordinator): persist agent results immediately after read_agent#684tamirdresher wants to merge 1 commit intodevfrom
Conversation
…652) read_agent data expires within ~2-3 minutes, causing silent data loss when the coordinator takes too long between collecting results and processing them. Changes to After Agent Work flow: - Add mandatory step 2: write results to orchestration-log BEFORE any other processing (presenting results, spawning Scribe, etc.) - Update silent-success detection (now step 3) to check orchestration-log for files agents may have written directly - Update Scribe task 3 to enrich coordinator-written logs rather than create from scratch - Renumber steps 3-6 to 4-7 - Update LEAN instruction to acknowledge persistence write as the one permitted file I/O operation Refs #652 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Updates the Squad coordinator “After Agent Work” instructions to mitigate read_agent result expiration by persisting agent outputs immediately and adjusting downstream Scribe/orchestration-log behavior.
Changes:
- Add a new mandatory step to persist
read_agentresults to.squad/orchestration-log/before any further processing. - Extend “silent success detection” to also look for agent-written orchestration-log files.
- Update Scribe’s orchestration-log task from “write” to “enrich”, and renumber subsequent coordinator steps.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
.github/agents/squad.agent.md |
Updates authoritative coordinator + Scribe post-agent-work procedure (new persistence step, renumbering). |
templates/squad.agent.md.template |
Mirrors the updated coordinator/Scribe instructions for template distribution. |
packages/squad-cli/templates/squad.agent.md.template |
Mirrors the updated template in the CLI package. |
packages/squad-sdk/templates/squad.agent.md.template |
Mirrors the updated template in the SDK package. |
| **⚡ Keep the post-work turn LEAN.** Coordinator's job: (1) persist results, (2) present compact results, (3) spawn Scribe. That's ALL. No decision consolidation, no heavy file I/O beyond the mandatory persistence write in step 2. | ||
|
|
||
| **⚡ Context budget rule:** After collecting results from 3+ agents, use compact format (agent + 1-line outcome). Full details go in orchestration log via Scribe. | ||
|
|
||
| After each batch of agent work: | ||
|
|
||
| 1. **Collect results** via `read_agent` (wait: true, timeout: 300). | ||
|
|
||
| 2. **Silent success detection** — when `read_agent` returns empty/no response: | ||
| 2. **Persist results immediately** — `read_agent` data expires within ~2-3 minutes. Write each agent's result to `.squad/orchestration-log/{timestamp}-{agent-name}.md` **BEFORE** any other processing (presenting results, spawning Scribe, etc.). Create the directory if it doesn't exist. Use ISO 8601 UTC timestamp (e.g. `20260401T1423Z`). Format: |
There was a problem hiding this comment.
Template parity: these edits were applied to mirror copies, but the canonical template .squad-templates/squad.agent.md still has the old “After Agent Work” steps. The repo’s template-sync tests enforce byte-for-byte equality from .squad-templates/ to templates/, packages/*/templates/, and .github/agents/ (see test/template-sync.test.ts:4-6,118-141), so this PR will fail CI and/or get reverted the next time node scripts/sync-templates.mjs runs. Please apply the change to .squad-templates/squad.agent.md and re-run the sync script so all mirrors match.
| 2. **Persist results immediately** — `read_agent` data expires within ~2-3 minutes. Write each agent's result to `.squad/orchestration-log/{timestamp}-{agent-name}.md` **BEFORE** any other processing (presenting results, spawning Scribe, etc.). Create the directory if it doesn't exist. Use ISO 8601 UTC timestamp (e.g. `20260401T1423Z`). Format: | ||
| ```markdown | ||
| # {Agent Name} — {ISO 8601 UTC timestamp} | ||
| ## Task | ||
| {what was asked} | ||
| ## Result | ||
| {agent output summary} | ||
| ## Files Modified | ||
| {list of files the agent created or changed, or "None detected"} | ||
| ``` | ||
| This is the ONE exception to "no file I/O" — it is mandatory because results are unrecoverable once expired. |
There was a problem hiding this comment.
The new persistence step writes coordinator output into .squad/orchestration-log/{timestamp}-{agent-name}.md using a new header-based format, but this conflicts with existing orchestration-log conventions in this repo: templates/orchestration-log.md defines a table-based entry and says entries are created BEFORE spawning (and append-only), and this same squad.agent prompt elsewhere says “Scribe writes one entry per agent … format matches the existing orchestration log entry template.” Either adjust the persistence location/filename (e.g., a separate “raw results” log) or update the orchestration-log template + the earlier sections so the coordinator-written file format and timing are consistent.
| 2. **Persist results immediately** — `read_agent` data expires within ~2-3 minutes. Write each agent's result to `.squad/orchestration-log/{timestamp}-{agent-name}.md` **BEFORE** any other processing (presenting results, spawning Scribe, etc.). Create the directory if it doesn't exist. Use ISO 8601 UTC timestamp (e.g. `20260401T1423Z`). Format: | ||
| ```markdown | ||
| # {Agent Name} — {ISO 8601 UTC timestamp} | ||
| ## Task | ||
| {what was asked} | ||
| ## Result | ||
| {agent output summary} | ||
| ## Files Modified | ||
| {list of files the agent created or changed, or "None detected"} | ||
| ``` | ||
| This is the ONE exception to "no file I/O" — it is mandatory because results are unrecoverable once expired. | ||
|
|
||
| 3. **Silent success detection** — when `read_agent` returns empty/no response: | ||
| - Check `.squad/orchestration-log/` for a file the agent may have written directly during its run. |
There was a problem hiding this comment.
Flow ambiguity: step 2 is written as mandatory “persist results immediately” for each agent, but step 3 is specifically for the case where read_agent returns empty/no response (i.e., there is no result content to persist). Please clarify the conditional behavior (e.g., only persist when read_agent returns content, or persist a stub that explicitly records the loss) so the numbered procedure is executable as written.
| 1. DECISIONS ARCHIVE [HARD GATE]: If decisions.md >= 20480 bytes, archive entries older than 30 days NOW. If >= 51200 bytes, archive entries older than 7 days. Do not skip this step. | ||
| 2. DECISION INBOX: Merge .squad/decisions/inbox/ → decisions.md, delete inbox files. Deduplicate. | ||
| 3. ORCHESTRATION LOG: Write .squad/orchestration-log/{timestamp}-{agent}.md per agent. Use ISO 8601 UTC timestamp. | ||
| 3. ORCHESTRATION LOG: Enrich .squad/orchestration-log/{timestamp}-{agent}.md files written by coordinator in step 2. Add detail from agent history if available. Write new entries for any agents not yet logged. Use ISO 8601 UTC timestamp. |
There was a problem hiding this comment.
Placeholder/filename inconsistency: coordinator step 2 uses {timestamp}-{agent-name}.md, while the Scribe task refers to {timestamp}-{agent}.md. To avoid Scribe missing files or generating duplicates, please standardize on one placeholder and explicitly state it should use the spawned agent name (lowercase cast id) as the filename-safe identifier.
| 3. ORCHESTRATION LOG: Enrich .squad/orchestration-log/{timestamp}-{agent}.md files written by coordinator in step 2. Add detail from agent history if available. Write new entries for any agents not yet logged. Use ISO 8601 UTC timestamp. | |
| 3. ORCHESTRATION LOG: Enrich .squad/orchestration-log/{timestamp}-{agent-name}.md files written by coordinator in step 2, where `agent-name` is the spawned agent `name` (cast id), lowercased and made filename-safe. Add detail from agent history if available. Write new entries for any agents not yet logged. Use ISO 8601 UTC timestamp. |
Summary
ead_agent results expire within ~2-3 minutes. When the coordinator takes too long between collecting and processing results (e.g., during fan-out with many agents), data is silently lost. This adds a mandatory persistence step to the "After Agent Work" coordinator flow.
Changes
New step 2 — Persist results immediately:
ead_agent returns, write each agent's result to .squad/orchestration-log/{timestamp}-{agent-name}.md BEFORE any other processing
Updated step 3 — Silent success detection:
Updated Scribe task 3:
Housekeeping:
Files Changed
Testing
Coordinator instruction change only — no runtime code. Verified all 4 files have identical changes.
Refs #652