Design testing strategy to isolate Squad failures from Copilot CLI failures

## Problem

When Squad CI tests fail, it's often unclear whether the failure is:
1. **A real Squad bug** — code we changed broke something
2. **A Copilot CLI infrastructure issue** — the test harness, agent spawning, or CLI environment is the actual failure point
3. **A pre-existing failure on the base branch** — unrelated to the current PR's changes

This ambiguity wastes significant debugging time. In the current StorageProvider PR (#18), we chased \epl-ux-fixes.test.ts:921\ and \shell.test.ts\ failures that turned out to be either pre-existing on \dev\ or caused by Copilot CLI's async migration patterns — not by StorageProvider changes.

## Goals

1. **Classify test failures** — automatically determine if a failure is SA-scoped (files this PR touched) vs pre-existing vs infrastructure
2. **Baseline comparison** — compare PR test results against \dev\ branch results to identify pre-existing failures
3. **Failure attribution** — tag each test failure with the likely root cause category
4. **Reduce false alarms** — stop agents from chasing failures unrelated to their work

## Proposed Approach

### 1. Baseline CI snapshot
- Run full test suite on \dev\ nightly (or on-demand) and store results as a baseline
- PR CI compares its failures against the baseline — any failure that also exists on \dev\ is flagged as pre-existing

### 2. Scope-aware test filtering
- Given the set of files changed in a PR, determine which test files are relevant
- Flag failures in unrelated test files as \out-of-scope\ (informational, not blocking)

### 3. Failure categorization
- \🔴 PR-caused\ — test passes on dev, fails on PR branch, test file is in scope
- \🟡 Pre-existing\ — test also fails on dev (zero diff on the test file between branches)
- \⚪ Out-of-scope\ — test fails but the test file has no relationship to changed files
- \🔵 Infrastructure\ — timeout, OOM, orphan process cleanup, Node.js deprecation warnings

### 4. Agent workflow integration
- When Squad agents check CI, they should read the categorized results
- Agents should only fix \🔴 PR-caused\ failures
- \🟡 Pre-existing\ failures get logged but not chased

## Examples from StorageProvider PR #18

| Failure | Category | Why |
|---------|----------|-----|
| \shell.test.ts\ — async/await missing | 🔴 PR-caused | SA migration made functions async, tests needed updating |
| \storage-provider.test.ts:465\ — EPERM/EACCES | 🔴 PR-caused | New test, cross-platform error code difference |
| \epl-ux-fixes.test.ts:921\ — 'squad init' | 🟡 Pre-existing | Zero diff on this file between dev and SA branch |

## Acceptance Criteria

- [ ] CI can identify pre-existing failures (baseline comparison)
- [ ] Test failures are categorized by scope relevance to PR changes
- [ ] Agent spawn prompts include failure categorization so agents don't chase unrelated failures
- [ ] Skill document created at \.squad/skills/ci-failure-triage/SKILL.md\ encoding the triage patterns

## Labels
squad, squad:flight

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design testing strategy to isolate Squad failures from Copilot CLI failures #19

Problem

Goals

Proposed Approach

1. Baseline CI snapshot

2. Scope-aware test filtering

3. Failure categorization

4. Agent workflow integration

Examples from StorageProvider PR #18

Acceptance Criteria

Labels

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Failure	Category	Why
\shell.test.ts\ — async/await missing	🔴 PR-caused	SA migration made functions async, tests needed updating
\storage-provider.test.ts:465\ — EPERM/EACCES	🔴 PR-caused	New test, cross-platform error code difference
\
epl-ux-fixes.test.ts:921\ — 'squad init'	🟡 Pre-existing	Zero diff on this file between dev and SA branch

Design testing strategy to isolate Squad failures from Copilot CLI failures #19

Description

Problem

Goals

Proposed Approach

1. Baseline CI snapshot

2. Scope-aware test filtering

3. Failure categorization

4. Agent workflow integration

Examples from StorageProvider PR #18

Acceptance Criteria

Labels

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions