Expand evals to 20 and improve SKILL.md diagnostic coverage by CybotTM · Pull Request #50 · netresearch/github-project-skill

CybotTM · 2026-04-01T08:30:44Z

Summary

Expanded evals from 2 to 20, covering all major skill domains: auto-merge, branch protection, security compliance, merge strategy, CodeQL, review threads, merge queues, Copilot race conditions, and workflow file limitations
Improved SKILL.md with new "Security & Compliance Quick Checks" and "Merge Strategy Issues" sections plus expanded "When to Use" triggers
SKILL.md stays under 500 words (487 total)

A/B Test Results

Eval	Name	A (Original)	B (Improved)
1	setup_branch_protection	PARTIAL	PASS
2	fix_blocked_pr_merge	PASS	PASS
3	setup_auto_merge_workflow	PASS	PASS
4	diagnose_auto_merge_failure	PASS	PASS
5	solo_maintainer_pr_stuck	PASS	PASS
6	setup_codeowners	PARTIAL	PARTIAL
7	fix_github_actions_failure	PASS	PASS
8	migrate_master_to_main	FAIL	FAIL
9	setup_dependabot	FAIL	FAIL
10	codeql_default_setup_conflict	FAIL	PASS
11	signed_commits_merge_failure	FAIL	PASS
12	pr_too_many_commits	FAIL	PARTIAL
13	enforce_admins_audit	FAIL	PASS
14	resolve_review_threads	FAIL	PASS
15	openssf_scorecard_improvement	FAIL	PARTIAL
16	workflow_permissions_least_privilege	FAIL	PARTIAL
17	setup_release_labeling	FAIL	PARTIAL
18	merge_queue_troubleshooting	FAIL	PARTIAL
19	copilot_reviewer_race_condition	FAIL	PASS
20	workflow_file_pr_cannot_merge	FAIL	PASS

Score: A = 12/40, B = 29/40 (+142%)

Remaining FAILs (8, 9) are low-priority setup tasks adequately covered by reference file links.

Test plan

Verify evals.json is valid JSON
Verify SKILL.md word count is under 500
Spot-check that new eval assertions match SKILL.md and reference content

Add 18 new evals covering auto-merge setup, solo maintainer workflow, CodeQL conflicts, signed commit merge failures, enforce_admins audit, review thread resolution, OpenSSF Scorecard, merge queue issues, Copilot reviewer race conditions, and workflow file merge limitations. Improve SKILL.md with expanded "When to Use" triggers, new "Security & Compliance Quick Checks" section with inline gh commands, and "Merge Strategy Issues" section. Keeps SKILL.md under 500 words (487). A/B test shows version B scores 29/40 vs original 12/40 (+142%).

gemini-code-assist

Code Review

This pull request updates the GitHub Project skill documentation to include more detailed troubleshooting scenarios, such as security compliance checks, merge strategy issues, and OpenSSF Scorecard improvements. It also significantly expands the evaluation test suite to cover these new scenarios. Feedback highlights a contradiction in the advice regarding squash merges for signed commits and suggests improving the readability of a GraphQL query output using a JQ filter.

gemini-code-assist · 2026-04-01T08:32:47Z

skills/github-project/SKILL.md

-Verify repository configuration against best practices:
+### Merge Strategy Issues
+
+Rebase merge fails with signed commits: enable squash or auto-detect strategy. Workflow file PRs need manual merge (GITHUB_TOKEN lacks `workflows` scope). Copilot reviewer race conditions: re-run auto-approve workflow. See `references/auto-merge-guide.md`.


The recommendation to 'enable squash' for signed commit failures contradicts references/merge-strategy.md, which explicitly states that squash merges are incompatible with signed commits (line 115) and recommends using --merge instead (line 165). Conversely, references/auto-merge-guide.md (line 161) claims squash is preferred and compatible. Please reconcile these reference files to ensure the skill provides consistent and correct advice.

gemini-code-assist · 2026-04-01T08:32:47Z

skills/github-project/SKILL.md

+gh api graphql -f query='query($owner:String!,$repo:String!,$pr:Int!){
+  repository(owner:$owner,name:$repo){pullRequest(number:$pr){
+    reviewThreads(first:50){nodes{id isResolved}}
+  }}
+}' -f owner=OWNER -f repo=REPO -F pr=NUMBER


This GraphQL query is redundant as the 'PR Won't Merge' section (lines 35-40) already covers reviewThreads. If you choose to keep it here for the 'Security & Compliance' context, please add a --jq filter to make the output readable and consistent with other examples in this file.

Suggested change

gh api graphql -f query='query($owner:String!,$repo:String!,$pr:Int!){

repository(owner:$owner,name:$repo){pullRequest(number:$pr){

reviewThreads(first:50){nodes{id isResolved}}

}}

}' -f owner=OWNER -f repo=REPO -F pr=NUMBER

gh api graphql -f query='query($owner:String!,$repo:String!,$pr:Int!){

repository(owner:$owner,name:$repo){pullRequest(number:$pr){

reviewThreads(first:50){nodes{id isResolved}}

}}

}' -f owner=OWNER -f repo=REPO -F pr=NUMBER --jq '.data.repository.pullRequest.reviewThreads.nodes'

Copilot

Pull request overview

Expands the GitHub Project skill eval suite and updates SKILL.md to add more diagnostic guidance across branch protection, auto-merge, security/compliance, and merge strategy troubleshooting.

Changes:

Expanded evals.json from a small set to 20 evals covering major GitHub repo management/troubleshooting scenarios.
Updated SKILL.md “When to Use” triggers and added new diagnostic sections for security/compliance and merge strategy issues.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
skills/github-project/evals/evals.json	Adds many new eval prompts/assertions and broadens existing assertion patterns.
skills/github-project/SKILL.md	Refreshes usage triggers and adds new diagnostic/troubleshooting sections.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T08:37:31Z

skills/github-project/evals/evals.json

+      },
+      {
+        "type": "content",
+        "pattern": "(contents: read|pull-requests: write|least.privilege)"


In a regex, . matches any character. If you intended to assert the literal phrase least privilege (or least-privilege), this pattern will also match unintended strings like leastXprivilege. Consider escaping the dot (least\\.privilege) or rewriting the alternation to match the literal wording you expect.

Suggested change

"pattern": "(contents: read|pull-requests: write|least.privilege)"

"pattern": "(contents: read|pull-requests: write|least[ -]privilege)"

Copilot · 2026-04-01T08:37:32Z

skills/github-project/evals/evals.json

+      },
+      {
+        "type": "content",
+        "pattern": "(force.push|re-queue|resolveReviewThread|auto-approve)"


The force.push token uses . which is a wildcard in regex. If the intent is to match force-push/force push wording, this will mis-match (and also over-match). Escape the dot (force\\.push) if you truly want a literal force.push, or update the pattern to explicitly match the expected phrase (e.g., force[- ]push).

Suggested change

"pattern": "(force.push|re-queue|resolveReviewThread|auto-approve)"

"pattern": "(force[- ]push|re-queue|resolveReviewThread|auto-approve)"

Copilot · 2026-04-01T08:37:32Z

skills/github-project/SKILL.md

+gh api repos/OWNER/REPO/branches/main/protection --jq '.enforce_admins.enabled'
+gh api repos/OWNER/REPO/code-scanning/default-setup --jq '.state'


This hard-codes main for the branch protection check. Since one of the evals explicitly covers migrating master→main, and repos can have non-main default branches, it’s easy for readers to run this command against the wrong branch. Suggest using a placeholder like DEFAULT_BRANCH (or documenting how to obtain it via gh repo view) instead of main.

Copilot · 2026-04-01T08:37:32Z

skills/github-project/SKILL.md

@@ -12,26 +12,26 @@ allowed-tools: Bash(gh:*) Bash(git:*) Bash(grep:*) Read Write

 # GitHub Project Skill



The previous ## Overview section heading was removed, leaving an unheaded overview sentence. If other skills/docs rely on consistent section headings for navigation or automated extraction, consider restoring ## Overview and placing the sentence under it for consistency.

Suggested change

## Overview

github-actions · 2026-04-01T08:45:00Z

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA c7d8ff0.

Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copilot AI review requested due to automatic review settings April 1, 2026 08:30

gemini-code-assist bot reviewed Apr 1, 2026

View reviewed changes

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Copilot started reviewing on behalf of CybotTM April 1, 2026 08:46 View session

CybotTM merged commit 8911bc3 into main Apr 1, 2026
11 checks passed

CybotTM deleted the feature/evals-and-improvements branch April 1, 2026 09:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand evals to 20 and improve SKILL.md diagnostic coverage#50

Expand evals to 20 and improve SKILL.md diagnostic coverage#50
CybotTM merged 1 commit intomainfrom
feature/evals-and-improvements

CybotTM commented Apr 1, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

gemini-code-assist bot Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

github-actions bot commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	"pattern": "(contents: read\|pull-requests: write\|least.privilege)"
	"pattern": "(contents: read\|pull-requests: write\|least[ -]privilege)"

	"pattern": "(force.push\|re-queue\|resolveReviewThread\|auto-approve)"
	"pattern": "(force[- ]push\|re-queue\|resolveReviewThread\|auto-approve)"

		gh api repos/OWNER/REPO/branches/main/protection --jq '.enforce_admins.enabled'
		gh api repos/OWNER/REPO/code-scanning/default-setup --jq '.state'

		@@ -12,26 +12,26 @@ allowed-tools: Bash(gh:) Bash(git:) Bash(grep:*) Read Write

		# GitHub Project Skill

Conversation

CybotTM commented Apr 1, 2026

Summary

A/B Test Results

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 1, 2026

Dependency Review

Snapshot Warnings

Scanned Files

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants