diff --git a/.github/workflows/code-review.yml b/.github/workflows/code-review.yml index 5f8d7cf..51bd4e8 100644 --- a/.github/workflows/code-review.yml +++ b/.github/workflows/code-review.yml @@ -43,6 +43,8 @@ jobs: env: BASE_BRANCH: ${{ github.event.pull_request.base.ref }} run: | + set -euo pipefail + codex exec \ --sandbox read-only \ --json \ @@ -51,6 +53,56 @@ jobs: --base "${BASE_BRANCH}" \ > codex-review.jsonl + if [ -s codex-review.md ]; then + exit 0 + fi + + echo "codex-review.md is empty; attempting JSONL fallback." + node <<'NODE' + const fs = require("node:fs"); + + const outputPath = "codex-review.md"; + const jsonlPath = "codex-review.jsonl"; + + if (!fs.existsSync(jsonlPath)) { + process.exit(0); + } + + let lastAgentMessage = ""; + const lines = fs.readFileSync(jsonlPath, "utf8").split(/\r?\n/); + for (const line of lines) { + if (!line.trim()) { + continue; + } + + try { + const event = JSON.parse(line); + if (event?.type !== "item.completed") { + continue; + } + if (event?.item?.type !== "agent_message") { + continue; + } + + const text = `${event?.item?.text ?? ""}`.trim(); + if (text) { + lastAgentMessage = text; + } + } catch { + // Ignore malformed JSONL lines and keep searching for valid events. + } + } + + if (lastAgentMessage) { + fs.writeFileSync(outputPath, `${lastAgentMessage}\n`, "utf8"); + } + NODE + + if [ ! -s codex-review.md ]; then + echo "Codex review produced no final message in codex-review.md or codex-review.jsonl." >&2 + exit 1 + fi + - name: Post inline PR comments uses: actions/github-script@v7 env: @@ -63,6 +115,8 @@ jobs: const path = require("node:path"); const MARKER = ""; + const OVERALL_MARKER = ""; + const MAX_REVIEW_COMMENT_CHARS = 60000; const REVIEW_FILE = process.env.REVIEW_FILE || "codex-review.md"; const OUTPUT_JSONL = process.env.OUTPUT_JSONL || "codex-review.jsonl"; const workspace = (process.env.GITHUB_WORKSPACE || "").replace(/\\/g, "/"); @@ -142,11 +196,57 @@ jobs: return findings; } + async function upsertOverallReviewComment(text) { + const trimmedReview = text.trim(); + const reviewBody = trimmedReview + ? trimmedReview.length > MAX_REVIEW_COMMENT_CHARS + ? `${trimmedReview.slice(0, MAX_REVIEW_COMMENT_CHARS)}\n\n_(Review text truncated for GitHub comment limits.)_` + : trimmedReview + : "_Codex produced no review text._"; + + const overallBody = [ + OVERALL_MARKER, + "", + reviewBody, + "", + `Raw outputs: ${REVIEW_FILE}, ${OUTPUT_JSONL}`, + ].join("\n"); + + const issueComments = await github.paginate( + github.rest.issues.listComments, + { owner, repo, issue_number: pullNumber, per_page: 100 } + ); + const existingOverall = issueComments.find( + (comment) => + comment.user?.login === "github-actions[bot]" && + typeof comment.body === "string" && + comment.body.includes(OVERALL_MARKER) + ); + + if (existingOverall) { + await github.rest.issues.updateComment({ + owner, + repo, + comment_id: existingOverall.id, + body: overallBody, + }); + } else { + await github.rest.issues.createComment({ + owner, + repo, + issue_number: pullNumber, + body: overallBody, + }); + } + } + const reviewText = fs.readFileSync(REVIEW_FILE, "utf8"); + await upsertOverallReviewComment(reviewText); const findings = parseFindings(reviewText); if (!findings.length) { core.info("No line-addressable findings were parsed from Codex output."); + return; } diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index edadf6b..0fbb790 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -48,7 +48,7 @@ Groundwork ships skills from two maintenance locations, but inventory is unified Skills maintained in this repository are listed in `skills/skills.toml` with local paths under `skills/`. These skills define the pipeline's structure — what stages exist, what handoff contracts connect them, and what cognitive discipline the pipeline enforces. -Skills maintained upstream (from [obra/superpowers](https://github.com/obra/superpowers)) are listed in the same manifest with pinned commits and fetched at install time. They fill the execution phase — TDD, debugging, subagent orchestration, code review, verification — where high-quality implementations already exist. +Skills maintained upstream (from [obra/superpowers](https://github.com/obra/superpowers)) are listed in the same manifest with pinned commits and fetched at install time. They fill the execution phase — debugging and subagent orchestration — where high-quality implementations already exist. Curated skills are pinned to a specific commit. They are not forked, vendored, or modified. Integration happens through documentation: WORKFLOW.md defines handoff rules that connect curated skills to the pipeline's input/output contracts. diff --git a/CHANGELOG.md b/CHANGELOG.md index 5ed168d..4d01da0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,7 +10,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). - Five-stage methodology pipeline: frame constraints, define behavior, decompose, execute and verify, land - Ten core skills: `ground`, `research`, `bdd`, `issue-craft`, `begin`, `plan`, `test-first`, `documentation`, `land`, and `using-groundwork` (methodology orientation) -- Five curated skills from [obra/superpowers](https://github.com/obra/superpowers): `subagent-driven-development`, `systematic-debugging`, `verification-before-completion`, `requesting-code-review`, `receiving-code-review` +- Two curated skills from [obra/superpowers](https://github.com/obra/superpowers): `subagent-driven-development`, `systematic-debugging` - Rust CLI (`groundwork init`, `update`, `list`, `doctor`) with curated manifest and `sk` integration - Automatic `gh-issue-sync` installation during `groundwork init` - Schema distribution in CLI: `init/update` now provision `.groundwork/schemas/`, create `.groundwork/artifacts/`, and `doctor` reports schema completeness/drift @@ -22,6 +22,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ### Changed +- Replaced curated `verification-before-completion` (obra/superpowers) with groundwork-native original (v1.0.0). Preserves core discipline (Iron Law, gate function, common-failures table, anti-rationalization patterns, red flags). Adds Lifecycle Role section establishing pipeline position, Corruption Modes section, and cross-references. Removes superpowers-specific vocabulary. - Documented upstream attribution convention in CONTRIBUTING.md: LICENSE-UPSTREAM requirements for adapted skills, origin metadata standard, curated-skills attribution reasoning, and attribution checklist at the skill-authoring boundary - Replaced curated `test-driven-development` (obra/superpowers) with groundwork-native `test-first` skill (v1.0.0). Preserves core discipline (Iron Law, red-green-refactor, delete-and-start-over, anti-rationalization patterns) while adding bidirectional composition with `bdd`, `verification-before-completion`, `systematic-debugging`, and `documentation`. Lifecycle Role section establishes pipeline position. Corruption Modes section added. Language-agnostic examples replace TypeScript-only. Companion `testing-anti-patterns.md` migrated to `references/`. - `ground` skill upgraded to v3.0.0: broadened from design-only to full first-principles cognitive discipline covering strategic analysis, cost decomposition, and problem reframing. Added Active Excavation section (Socratic Drilling, Recursive Why). Decompose step now supports three modes (requirements, constituent, process) with Orient determining which applies. Orient step broadened with mode-specific questions. New "Structure survey as analysis" corruption mode. All existing patterns and corruption modes retained. diff --git a/agents.toml b/agents.toml index 887e951..58f1efa 100644 --- a/agents.toml +++ b/agents.toml @@ -20,4 +20,4 @@ third_force = { gh = "pentaxis93/groundwork", path = "skills/third-force" } subagent_driven_development = { gh = "obra/superpowers", path = "skills/subagent-driven-development", rev = "e4a2375cb705ca5800f0833528ce36a3faf9017a" } test_first = { gh = "pentaxis93/groundwork", path = "skills/test-first" } systematic_debugging = { gh = "obra/superpowers", path = "skills/systematic-debugging", rev = "e4a2375cb705ca5800f0833528ce36a3faf9017a" } -verification_before_completion = { gh = "obra/superpowers", path = "skills/verification-before-completion", rev = "e4a2375cb705ca5800f0833528ce36a3faf9017a" } +verification_before_completion = { gh = "pentaxis93/groundwork", path = "skills/verification-before-completion" } diff --git a/crates/groundwork-cli/src/main.rs b/crates/groundwork-cli/src/main.rs index e946ed9..433463d 100644 --- a/crates/groundwork-cli/src/main.rs +++ b/crates/groundwork-cli/src/main.rs @@ -1951,6 +1951,7 @@ mod tests { "skills/test-first", "skills/third-force", "skills/documentation", + "skills/verification-before-completion", "skills/propose", "skills/land", ] diff --git a/docs/architecture/pipeline-contract.md b/docs/architecture/pipeline-contract.md index acc4a41..e228a57 100644 --- a/docs/architecture/pipeline-contract.md +++ b/docs/architecture/pipeline-contract.md @@ -8,7 +8,7 @@ Groundwork has one coherent path: 1. `ground` frames constraints. 2. `bdd` defines and maintains behavior contract. 3. `plan` converges from exploration to a decision-complete implementation design. `issue-craft` decomposes that design into agent-executable issues. `begin` initiates the work session: selects session-sized work, prepares the workspace, and declares direction. -4. Curated middle skills implement and verify the same behavior contract. +4. Middle skills implement and verify the same behavior contract. 5. `land` closes work with behavior coverage visibility. `bdd` and `test-first` are not alternatives. diff --git a/skills/skills.toml b/skills/skills.toml index c7889ae..9e6debb 100644 --- a/skills/skills.toml +++ b/skills/skills.toml @@ -90,8 +90,7 @@ use_when = "Use when documentation may drift or when a decision or behavior chan name = "verification-before-completion" path = "skills/verification-before-completion" provider = "gh" -repo = "obra/superpowers" -rev = "e4a2375cb705ca5800f0833528ce36a3faf9017a" +repo = "pentaxis93/groundwork" use_when = "Use before claiming work is complete, fixed, or passing." [[skills]] diff --git a/skills/verification-before-completion/LICENSE-UPSTREAM b/skills/verification-before-completion/LICENSE-UPSTREAM new file mode 100644 index 0000000..3665a3a --- /dev/null +++ b/skills/verification-before-completion/LICENSE-UPSTREAM @@ -0,0 +1,35 @@ +This skill is adapted from the verification-before-completion skill in +obra/superpowers (https://github.com/obra/superpowers), pinned at commit +e4a2375. The original skill is licensed under the MIT License reproduced +below. + +Substantial portions of the original are preserved in this adaptation: the +Iron Law, the gate function (identify → run → read → verify → claim), the +common-failures table, the rationalization prevention table, the red flags +list, and the key verification patterns. These have been adapted with +vocabulary normalization and structural additions (lifecycle role, +corruption modes, cross-references). + +--- + +MIT License + +Copyright (c) 2025 Jesse Vincent + +Permission is hereby granted, free of charge, to any person obtaining a copy +of this software and associated documentation files (the "Software"), to deal +in the Software without restriction, including without limitation the rights +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the Software is +furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. diff --git a/skills/verification-before-completion/SKILL.md b/skills/verification-before-completion/SKILL.md new file mode 100644 index 0000000..e295ffd --- /dev/null +++ b/skills/verification-before-completion/SKILL.md @@ -0,0 +1,182 @@ +--- +name: verification-before-completion +description: >- + Use when about to claim work is complete, fixed, or passing. Use before + committing, creating PRs, or moving to the next task. Requires running + verification commands and confirming output before making any success + claims. Evidence before assertions, always. If you are about to say + something is done, working, fixed, or passing, this skill applies. +metadata: + version: "1.0.0" + updated: "2026-03-09" + origin: >- + Adapted from Jesse Vincent's verification-before-completion skill in + obra/superpowers (https://github.com/obra/superpowers, MIT license, + pinned at e4a2375). The original is a standalone verification discipline + forged through real use. This skill preserves substantial portions of the + original — the Iron Law, the gate function, the common-failures table, + the anti-rationalization patterns, and the red flags list — and adds + lifecycle positioning, corruption modes, and cross-references following + groundwork conventions. See LICENSE-UPSTREAM for the full copyright + notice and license terms. + replaces: "verification-before-completion (obra/superpowers)" +--- + +# Verification Before Completion + +Evidence before claims, always. + +## The Iron Law + +``` +NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE +``` + +If you have not run the verification command in this message, you cannot +claim it passes. There are no exceptions to this rule. + +## The Gate Function + +``` +BEFORE claiming any status or expressing satisfaction: + +1. IDENTIFY: What command proves this claim? +2. RUN: Execute the FULL command (fresh, complete) +3. READ: Full output, check exit code, count failures +4. VERIFY: Does output confirm the claim? + - If NO: State actual status with evidence + - If YES: State claim WITH evidence +5. ONLY THEN: Make the claim + +Skip any step = the claim has no basis +``` + +## Lifecycle Role + +This skill owns **aggregate completion claims** — the moment before you say +"done." It fires after execution, before packaging work for review. + +It does not own per-test cycle evidence (that belongs to the test-first +discipline — each test watched failing, then passing). It owns the final +gate: all tests pass, all requirements met, the build succeeds, the work is +actually complete. + +## Common Failures + +| Claim | Requires | Not Sufficient | +|-------|----------|----------------| +| Tests pass | Test command output: 0 failures | Previous run, "should pass" | +| Linter clean | Linter output: 0 errors | Partial check, extrapolation | +| Build succeeds | Build command: exit 0 | Linter passing, logs look good | +| Bug fixed | Test original symptom: passes | Code changed, assumed fixed | +| Regression test works | Red-green cycle verified | Test passes once | +| Agent completed | VCS diff shows changes | Agent reports "success" | +| Requirements met | Line-by-line checklist | Tests passing | + +## Red Flags — STOP + +These are signals that you are about to make an unverified claim. If you +notice any of these, stop and run the gate function. + +- Using "should", "probably", "seems to" +- Expressing satisfaction before verification ("Great!", "Perfect!", "Done!") +- About to commit, push, or create a PR without verification +- Trusting agent success reports without independent verification +- Relying on partial verification to represent full results +- Thinking "just this once" or "this is a small change" +- Any wording implying success without having run verification + +## Rationalization Prevention + +| Excuse | Reality | +|--------|---------| +| "Should work now" | Run the verification | +| "I'm confident" | Confidence is not evidence | +| "Just this once" | No exceptions | +| "Linter passed" | Linter is not compiler | +| "Agent said success" | Verify independently | +| "Partial check is enough" | Partial proves nothing about the whole | +| "Different words so rule doesn't apply" | Spirit over letter — any success claim requires evidence | + +## Key Patterns + +**Tests:** +``` +Verified: [Run test command] → [See: 34/34 pass] → "All tests pass" +Unverified: "Should pass now" / "Looks correct" +``` + +**Regression tests (red-green):** +``` +Verified: Write → Run (pass) → Revert fix → Run (MUST FAIL) → Restore → Run (pass) +Unverified: "I've written a regression test" (without red-green verification) +``` + +**Build:** +``` +Verified: [Run build] → [See: exit 0] → "Build passes" +Unverified: "Linter passed" (linter does not check compilation) +``` + +**Requirements:** +``` +Verified: Re-read plan → Create checklist → Verify each item → Report gaps or completion +Unverified: "Tests pass, phase complete" +``` + +**Agent delegation:** +``` +Verified: Agent reports success → Check VCS diff → Verify changes independently → Report actual state +Unverified: Trust agent report at face value +``` + +## When to Apply + +**Always before:** +- Any variation of success or completion claims +- Any expression of satisfaction about work state +- Any positive statement about whether something works +- Committing, PR creation, task completion +- Moving to the next task +- Accepting delegated agent results + +**The rule applies to:** +- Exact phrases ("Tests pass") +- Paraphrases and synonyms ("Everything looks good") +- Implications of success ("Ready for review") +- Any communication suggesting completion or correctness + +## Corruption Modes + +**Performative verification.** Going through the motions without actually +checking. Running the command but not reading the output. Seeing a wall of +text and skipping to the claim. If the output did not change your +understanding, you did not verify. + +**Partial verification as sufficient.** Running one test file instead of the +full suite. Checking the linter but not the build. Verifying the happy path +but not the error cases. Partial evidence supports partial claims — nothing +more. + +**Stale evidence.** Citing output from a previous run instead of a fresh +execution. Code changed since the last run? The evidence is stale. Only +output from the current message counts. + +**Claim-first, evidence-second.** Deciding the work is done, then looking +for evidence to confirm. This inverts the gate function. Evidence determines +the claim — the claim does not select the evidence. + +## Cross-References + +- `test-first` owns per-test cycle evidence (each test watched failing, then + passing). This skill owns aggregate completion claims. +- `documentation` review fires after code changes, before this skill's gate. + Documentation accuracy is completion evidence. +- `propose` consumes this skill's output — work must be verified before + packaging for review. + +## The Bottom Line + +Run the command. Read the output. Then claim the result. + +No shortcuts. No exceptions.