Expand evals to 20 and improve SKILL.md diagnostic coverage#50
Conversation
Add 18 new evals covering auto-merge setup, solo maintainer workflow, CodeQL conflicts, signed commit merge failures, enforce_admins audit, review thread resolution, OpenSSF Scorecard, merge queue issues, Copilot reviewer race conditions, and workflow file merge limitations. Improve SKILL.md with expanded "When to Use" triggers, new "Security & Compliance Quick Checks" section with inline gh commands, and "Merge Strategy Issues" section. Keeps SKILL.md under 500 words (487). A/B test shows version B scores 29/40 vs original 12/40 (+142%).
There was a problem hiding this comment.
Code Review
This pull request updates the GitHub Project skill documentation to include more detailed troubleshooting scenarios, such as security compliance checks, merge strategy issues, and OpenSSF Scorecard improvements. It also significantly expands the evaluation test suite to cover these new scenarios. Feedback highlights a contradiction in the advice regarding squash merges for signed commits and suggests improving the readability of a GraphQL query output using a JQ filter.
| Verify repository configuration against best practices: | ||
| ### Merge Strategy Issues | ||
|
|
||
| Rebase merge fails with signed commits: enable squash or auto-detect strategy. Workflow file PRs need manual merge (GITHUB_TOKEN lacks `workflows` scope). Copilot reviewer race conditions: re-run auto-approve workflow. See `references/auto-merge-guide.md`. |
There was a problem hiding this comment.
The recommendation to 'enable squash' for signed commit failures contradicts references/merge-strategy.md, which explicitly states that squash merges are incompatible with signed commits (line 115) and recommends using --merge instead (line 165). Conversely, references/auto-merge-guide.md (line 161) claims squash is preferred and compatible. Please reconcile these reference files to ensure the skill provides consistent and correct advice.
| gh api graphql -f query='query($owner:String!,$repo:String!,$pr:Int!){ | ||
| repository(owner:$owner,name:$repo){pullRequest(number:$pr){ | ||
| reviewThreads(first:50){nodes{id isResolved}} | ||
| }} | ||
| }' -f owner=OWNER -f repo=REPO -F pr=NUMBER |
There was a problem hiding this comment.
This GraphQL query is redundant as the 'PR Won't Merge' section (lines 35-40) already covers reviewThreads. If you choose to keep it here for the 'Security & Compliance' context, please add a --jq filter to make the output readable and consistent with other examples in this file.
| gh api graphql -f query='query($owner:String!,$repo:String!,$pr:Int!){ | |
| repository(owner:$owner,name:$repo){pullRequest(number:$pr){ | |
| reviewThreads(first:50){nodes{id isResolved}} | |
| }} | |
| }' -f owner=OWNER -f repo=REPO -F pr=NUMBER | |
| gh api graphql -f query='query($owner:String!,$repo:String!,$pr:Int!){ | |
| repository(owner:$owner,name:$repo){pullRequest(number:$pr){ | |
| reviewThreads(first:50){nodes{id isResolved}} | |
| }} | |
| }' -f owner=OWNER -f repo=REPO -F pr=NUMBER --jq '.data.repository.pullRequest.reviewThreads.nodes' |
There was a problem hiding this comment.
Pull request overview
Expands the GitHub Project skill eval suite and updates SKILL.md to add more diagnostic guidance across branch protection, auto-merge, security/compliance, and merge strategy troubleshooting.
Changes:
- Expanded
evals.jsonfrom a small set to 20 evals covering major GitHub repo management/troubleshooting scenarios. - Updated SKILL.md “When to Use” triggers and added new diagnostic sections for security/compliance and merge strategy issues.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| skills/github-project/evals/evals.json | Adds many new eval prompts/assertions and broadens existing assertion patterns. |
| skills/github-project/SKILL.md | Refreshes usage triggers and adds new diagnostic/troubleshooting sections. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| }, | ||
| { | ||
| "type": "content", | ||
| "pattern": "(contents: read|pull-requests: write|least.privilege)" |
There was a problem hiding this comment.
In a regex, . matches any character. If you intended to assert the literal phrase least privilege (or least-privilege), this pattern will also match unintended strings like leastXprivilege. Consider escaping the dot (least\\.privilege) or rewriting the alternation to match the literal wording you expect.
| "pattern": "(contents: read|pull-requests: write|least.privilege)" | |
| "pattern": "(contents: read|pull-requests: write|least[ -]privilege)" |
| }, | ||
| { | ||
| "type": "content", | ||
| "pattern": "(force.push|re-queue|resolveReviewThread|auto-approve)" |
There was a problem hiding this comment.
The force.push token uses . which is a wildcard in regex. If the intent is to match force-push/force push wording, this will mis-match (and also over-match). Escape the dot (force\\.push) if you truly want a literal force.push, or update the pattern to explicitly match the expected phrase (e.g., force[- ]push).
| "pattern": "(force.push|re-queue|resolveReviewThread|auto-approve)" | |
| "pattern": "(force[- ]push|re-queue|resolveReviewThread|auto-approve)" |
| gh api repos/OWNER/REPO/branches/main/protection --jq '.enforce_admins.enabled' | ||
| gh api repos/OWNER/REPO/code-scanning/default-setup --jq '.state' |
There was a problem hiding this comment.
This hard-codes main for the branch protection check. Since one of the evals explicitly covers migrating master→main, and repos can have non-main default branches, it’s easy for readers to run this command against the wrong branch. Suggest using a placeholder like DEFAULT_BRANCH (or documenting how to obtain it via gh repo view) instead of main.
| @@ -12,26 +12,26 @@ allowed-tools: Bash(gh:*) Bash(git:*) Bash(grep:*) Read Write | |||
|
|
|||
| # GitHub Project Skill | |||
|
|
|||
There was a problem hiding this comment.
The previous ## Overview section heading was removed, leaving an unheaded overview sentence. If other skills/docs rely on consistent section headings for navigation or automated extraction, consider restoring ## Overview and placing the sentence under it for consistency.
| ## Overview |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Snapshot WarningsEnsure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice. Scanned FilesNone |
Summary
A/B Test Results
Score: A = 12/40, B = 29/40 (+142%)
Remaining FAILs (8, 9) are low-priority setup tasks adequately covered by reference file links.
Test plan