forked from bradygaster/squad
-
Notifications
You must be signed in to change notification settings - Fork 0
Design testing strategy to isolate Squad failures from Copilot CLI failures #19
Copy link
Copy link
Open
Labels
go:needs-researchNeeds investigationNeeds investigationsquadSquad triage inbox — Lead will assign to a memberSquad triage inbox — Lead will assign to a membersquad:fidoAssigned to FIDO (Quality Owner)Assigned to FIDO (Quality Owner)
Description
Problem
When Squad CI tests fail, it's often unclear whether the failure is:
- A real Squad bug — code we changed broke something
- A Copilot CLI infrastructure issue — the test harness, agent spawning, or CLI environment is the actual failure point
- A pre-existing failure on the base branch — unrelated to the current PR's changes
This ambiguity wastes significant debugging time. In the current StorageProvider PR (#18), we chased
epl-ux-fixes.test.ts:921\ and \shell.test.ts\ failures that turned out to be either pre-existing on \dev\ or caused by Copilot CLI's async migration patterns — not by StorageProvider changes.
Goals
- Classify test failures — automatically determine if a failure is SA-scoped (files this PR touched) vs pre-existing vs infrastructure
- Baseline comparison — compare PR test results against \dev\ branch results to identify pre-existing failures
- Failure attribution — tag each test failure with the likely root cause category
- Reduce false alarms — stop agents from chasing failures unrelated to their work
Proposed Approach
1. Baseline CI snapshot
- Run full test suite on \dev\ nightly (or on-demand) and store results as a baseline
- PR CI compares its failures against the baseline — any failure that also exists on \dev\ is flagged as pre-existing
2. Scope-aware test filtering
- Given the set of files changed in a PR, determine which test files are relevant
- Flag failures in unrelated test files as \out-of-scope\ (informational, not blocking)
3. Failure categorization
- \🔴 PR-caused\ — test passes on dev, fails on PR branch, test file is in scope
- \🟡 Pre-existing\ — test also fails on dev (zero diff on the test file between branches)
- \⚪ Out-of-scope\ — test fails but the test file has no relationship to changed files
- \🔵 Infrastructure\ — timeout, OOM, orphan process cleanup, Node.js deprecation warnings
4. Agent workflow integration
- When Squad agents check CI, they should read the categorized results
- Agents should only fix \🔴 PR-caused\ failures
- \🟡 Pre-existing\ failures get logged but not chased
Examples from StorageProvider PR #18
| Failure | Category | Why |
|---|---|---|
| \shell.test.ts\ — async/await missing | 🔴 PR-caused | SA migration made functions async, tests needed updating |
| \storage-provider.test.ts:465\ — EPERM/EACCES | 🔴 PR-caused | New test, cross-platform error code difference |
| \ | ||
| epl-ux-fixes.test.ts:921\ — 'squad init' | 🟡 Pre-existing | Zero diff on this file between dev and SA branch |
Acceptance Criteria
- CI can identify pre-existing failures (baseline comparison)
- Test failures are categorized by scope relevance to PR changes
- Agent spawn prompts include failure categorization so agents don't chase unrelated failures
- Skill document created at .squad/skills/ci-failure-triage/SKILL.md\ encoding the triage patterns
Labels
squad, squad:flight
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
go:needs-researchNeeds investigationNeeds investigationsquadSquad triage inbox — Lead will assign to a memberSquad triage inbox — Lead will assign to a membersquad:fidoAssigned to FIDO (Quality Owner)Assigned to FIDO (Quality Owner)