feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) by garrytan · Pull Request #83 · garrytan/gstack

garrytan · 2026-03-16T02:17:42Z

Summary

QA-only skill (/qa-only) — report-only mode that blocks Edit tool entirely, so bugs are documented without fixes
QA fix loop — /qa now runs find-fix-verify cycles: discover bugs, fix them, commit, re-navigate to confirm
Plan-to-QA artifact flow — /plan-eng-review writes test-plan artifacts to ~/.gstack/projects/<slug>/ that /qa picks up for targeted testing
{{QA_METHODOLOGY}} DRY placeholder — shared methodology block injected into both /qa and /qa-only templates
Browser ref staleness detection — resolveRef() now checks element count to detect stale refs after SPA navigation
Eval efficiency metrics — turns, duration, cost displayed across all eval surfaces with natural-language Takeaway commentary interpreting deltas
3 new E2E tests — qa-only guardrail, qa fix loop with commit verification, plan-eng-review test-plan artifact

Pre-Landing Review

No issues found. All changes are developer tooling, test infrastructure, and skill templates — no SQL, auth, or trust boundary code.

Eval Results

16/16 PASS — two consecutive clean runs at $4.42 and $4.42

Test	Status	Cost	Turns	Duration
browse basic	PASS	$0.08	6t	23s
browse snapshot	PASS	$0.05	6t	21s
SKILL.md setup discovery	PASS	$0.04	4t	12s
SKILL.md setup (no binary)	PASS	$0.04	2t	6s
SKILL.md outside git	PASS	$0.04	2t	6s
/qa quick	PASS	$0.47	29t	156s
/review SQL injection	PASS	$0.15	11t	48s
/qa b6-static	PASS	$0.23	18t	109s
/qa b7-spa	PASS	$0.47	38t	203s
/qa b8-checkout	PASS	$0.62	37t	337s
/plan-ceo-review	PASS	$0.67	5t	524s
/plan-eng-review	PASS	$0.17	4t	130s
/retro	PASS	$0.35	26t	210s
/qa-only no-fix	PASS	$0.42	25t	170s
/qa fix loop	PASS	$0.41	24t	160s
/plan-eng-review artifact	PASS	$0.21	15t	110s

Takeaway: Stable run — no significant efficiency changes, no regressions.

TODOS

No TODO items completed in this PR. 102 items remaining.

Test plan

All unit tests pass (145 tests, 0 failures)
All E2E evals pass (16/16, two consecutive runs)
b8-checkout and qa-only — previously flaky, now passing consistently

🤖 Generated with Claude Code

resolveRef() now checks element count to detect stale refs after page mutations (e.g. SPA navigation). RefEntry stores role+name metadata for better diagnostics. 3 new snapshot tests for staleness detection. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add /qa-only (report-only, Edit tool blocked), restructure /qa with find-fix-verify cycle, add {{QA_METHODOLOGY}} DRY placeholder for shared methodology. /plan-eng-review now writes test-plan artifacts to ~/.gstack/projects/<slug>/ for QA consumption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…l surfaces Add generateCommentary() for natural-language delta interpretation, per-test turns/duration in comparison and summary output, judgePassed unit tests, 3 new E2E tests (qa-only, qa fix loop, plan artifact). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- ARCHITECTURE: add ref staleness detection section, update RefEntry type - BROWSER: add ref staleness paragraph to snapshot system docs - CONTRIBUTING: update eval tool descriptions with commentary feature - README: fix missing qa-only in project-local uninstall command Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

garrytan and others added 6 commits March 15, 2026 21:17

chore: bump version and changelog (v0.4.0)

210e1b1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: add user-facing benefit descriptions to v0.4.0 changelog

4bbf6c6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

garrytan merged commit f3ee0ee into main Mar 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0)#83

feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0)#83
garrytan merged 6 commits intomainfrom
garrytan/qa-2.1

garrytan commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

garrytan commented Mar 16, 2026

Summary

Pre-Landing Review

Eval Results

TODOS

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant