docs(arxiv): survey paper on skill-based agentic coding for reductions by GiggleLiu · Pull Request #617 · CodingThrust/problem-reductions

GiggleLiu · 2026-03-12T17:25:45Z

Summary

Adds a complete arxiv paper draft (IEEEtran format, 15 pages) on skill-based agentic coding for NP-hard problem reductions
Includes 5 figures (Typst+CeTZ compiled to PDF), references, and supporting survey materials
Integrates real development metrics mined from Claude Code session history (~/.claude)

Key findings from Claude history data:

15:1 automation ratio (9,429 assistant messages vs 630 user messages across 283 sessions)
1,510 co-authored commits, 300 MB of conversation transcripts
75% issue rejection rate by automated quality gate on 322 batch-submitted issues
Codebase grew from 17 models/0 rules to 27 models/50 rules in 9 weeks
Prompt evolution: from imperative step-by-step commands (Phase 1) to single-command orchestration (Phase 3)

Paper structure:

Introduction — skill-based decomposition thesis
Why Reductions — Goldilocks domain argument
System Architecture — type-driven verification by construction
Skill-Based Task Decomposition — 12 skills, 3 roles, card-based pipeline
Multi-Layered Verification — 7-layer stack
Evaluation — development metrics, issue quality gate, case studies
Related Work — AI coding agents, AI-discovered reductions, formal verification
Discussion & Conclusion — generalizability, limitations, future directions

Test plan

Paper compiles cleanly with cd docs/paper/arxiv && pdflatex paper.tex
All figures render correctly in compiled PDF
No undefined references or broken cross-references
Numbers in paper match actual codebase state

🤖 Generated with Claude Code

…ductions Design spec for a full research paper (ICSE/ASE-class) on using skill-based AI agent pipelines to build verified NP-hard problem reduction libraries. Key decisions from brainstorming: - Methodology-first framing (Goldilocks domain + practical artifact) - Three roles: contributors (issues), maintainer (board curation), agents (manage + execute) - Multi-layered verification stack (7 layers from type system to documentation) - Evaluation: ablation (skill vs no-skill) + git mining + 3 case studies - Hardware solver motivation (Rydberg atoms, D-Wave) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ic-reductions

Add the two new card-based orchestration skills from origin/main: - project-pipeline: picks Ready cards, runs issue-to-pr in worktrees - review-pipeline: fixes Copilot comments, runs agentic tests, moves to In Review Updated S4.3 with the two-stage pipeline and explicit human touch points (Backlog→Ready and In Review→Done). Skills count updated to 13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

16 tasks in 5 parallelizable chunks: scaffolding, figures, sections S1-S4, sections S5-S6 with git mining, sections S7-S8 + final assembly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Switch figure generation from TikZ to Typst+CeTZ compiled to PDF, included in LaTeX via \includegraphics. Paper body remains LaTeX (IEEEtran class). Removed TikZ packages from preamble. Updated all figure tasks (3-6), conventions block, compile commands, and Task 17 assembly step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Set up paper.tex with IEEEtran conference class, 8 section stubs, and a ~150-word abstract. Combined survey bibliography (22 entries) with 6 foundational references (Karp, Cook, Garey-Johnson, Glover, Lucas, Barahona). Removed old paper.typ placeholder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ~800 words covering four paragraphs: self-contained/verifiable reductions (LOC stats from graph-metrics.json), homogeneous task structure vs SWE-Bench, hardware solver compilation layer (Rydberg atoms for MIS, D-Wave for QUBO), and real-world applications. Fix figure caption to use accurate counts (40 impl + 12 inferred edges). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add three subsections: S4.1 Three Roles (table + prose describing Contributor/Maintainer/Agent responsibilities), S4.2 Skills as Agent Functions (Table 1 with all 13 skills across 5 categories, detailed paragraphs per category), and S4.3 Card-Based Orchestration (two-stage pipeline with human touch points). Success rate column uses TBD placeholder pending Task 11 git mining results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add three evaluation subsections: S6.1 ablation study design (skill-based vs raw agent, with TBD results), S6.2 git history mining (58 PRs across 3 phases, error taxonomy table with TBD counts), and S6.3 case studies of MVC->MIS (96 LOC, simple complement), SAT->MIS (171 LOC, quadratic gadget), and Factoring->CircuitSAT->ILP (272+225 LOC, composition). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix critical issues identified by simulated peer review: - Fix timeline contradiction: "six months" -> "seven weeks" (abstract + discussion) - Fix author count: "two primary contributors" -> "three contributors" - Soften unsubstantiated "60% of errors" claim to qualitative language - Add agent platform identification (Claude Code, model versions) - Reframe unexecuted ablation as experimental design, not pending results - Add skills vs. prompt engineering differentiation paragraph - Fix malformed BibTeX entries (dual booktitle/journal fields) - Add Pichler 2018 citation for Rydberg atom MIS connection - Note vendor report status on Anthropic 2026 citation - Soften Table 2/3 captions to acknowledge pending data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix overfull hboxes in skills and error taxonomy tables by removing outer padding (@{}) and abbreviating long skill names. Add .gitignore for LaTeX build artifacts. All figures compile, cross-references verified, no undefined citations. Note: paper is 15 pages (over the 10-12 target). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Mine ~/.claude session data: 283 sessions, 300MB transcripts, 15:1 automation ratio, 1510 co-authored commits - Add development metrics paragraph with codebase growth timeline - Add issue quality gate data: 75% rejection rate on 322 checked issues - Add interaction evolution paragraph (imperative → declarative prompts) - Update counts: 24→27 models, 40→50 rules, 58→59 PRs, 7→9 weeks - Remove meta-power skill references (13→12 skills) - Replace Figure 1 with three-layer problemtree (from NSFC proposal) - Add future directions: reduction compiler with Pareto cost models - Save raw Claude history data to survey/claude-history-data.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

codecov · 2026-03-12T17:27:38Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.86%. Comparing base (e56b61f) to head (5118ac3).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #617   +/-   ##
=======================================
  Coverage   96.86%   96.86%           
=======================================
  Files         264      264           
  Lines       35196    35196           
=======================================
  Hits        34091    34091           
  Misses       1105     1105

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ic-reductions

Restructures the paper around the "bridge problem" concept — software too large for humans, made possible by agents constrained through systematic verification. Three barriers (convention drift, effort exhaustion, knowledge discontinuity) become the central thesis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Rewrite abstract and introduction with bridge problem concept - Add new Section 2 (Bridge Problems): definition, three barriers, verification constrains agent output, other candidate domains - Rename Section 3 to "Case Study: The Reduction Graph" - Rewrite Discussion: remove content now in Sec 2, tighten "Why Human Experts Remain Essential", rewrite conclusion - Move topology figure to appendix - Renumber sections throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…line Uncomment figure placeholders and add timeline figure in Evidence section. Figures will be compiled from .typ sources in a following commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- scaling-wall: hero figure showing 3 barriers human teams hit - verification-funnel: how verification constrains agent output - timeline: cumulative growth over 9 weeks with phase bands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The comparison against our own prior project weakens the argument. The bridge problem thesis stands on its own through the three structural barriers, not through a self-comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The reduction graph is the most informative visual — real data, not a conceptual sketch. Lead with what was built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ic-reductions

Three agent roles with skill mappings: - Mentor (4): propose, fix-issue, final-review, dev-setup - Orchestrator (5): project-pipeline, review-pipeline, issue-to-pr, check-issue, topology-sanity-check - Runner (7): add-model, add-rule, fix-pr, review-implementation, write-model-in-paper, write-rule-in-paper, release Replaces the old "two roles" (guides + runners) text with the three-role taxonomy and per-role TikZ diagrams. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ic-reductions

Reframe the agent taxonomy around knowledge asymmetry: Mentors guide humans with superior project knowledge; Workers execute routine heavy-lifting with less domain knowledge. Merges Orchestrator+Runner into Worker with lightweight subcategories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

GiggleLiu and others added 23 commits March 12, 2026 15:13

Merge remote-tracking branch 'origin/main' into worktree-survey-agent…

db0dde5

…ic-reductions

docs(arxiv): implementation plan for skill-based agentic coding paper

2e41ed1

16 tasks in 5 parallelizable chunks: scaffolding, figures, sections S1-S4, sections S5-S6 with git mining, sections S7-S8 + final assembly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): gather reduction graph metrics

59f6145

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): add Figure 1 — reduction graph (Typst+CeTZ)

1a2ba07

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): add Figure 3 — card-based pipeline diagram (Typst+CeTZ)

f44bbe3

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): add Figure 4 — verification pyramid (Typst+CeTZ)

2093a9a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): add Figure 2 — system architecture (Typst+CeTZ)

8fa350d

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): git history mining script and results

ba38a61

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): write S1 Introduction

b768eec

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): write S3 System Architecture

89ea557

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): write S7 Related Work

5f5a352

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): write S8 Discussion and Conclusion

1edc451

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs(arxiv): write S5 Multi-Layered Verification

f6bcd20

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GiggleLiu and others added 6 commits March 13, 2026 20:51

Merge remote-tracking branch 'origin/main' into worktree-survey-agent…

66e914e

…ic-reductions

update

eebd704

update

ca0ba3a

update typst paper

f7465a6

update plan

2d6c848

GiggleLiu and others added 14 commits March 14, 2026 21:28

Move reduction graph to Figure 1, remove scaling wall from main text

476619b

The reduction graph is the most informative visual — real data, not a conceptual sketch. Lead with what was built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into worktree-survey-agent…

031f2b8

…ic-reductions

fix cli

7fb50df

Merge remote-tracking branch 'origin/main' into worktree-survey-agent…

3f781c9

…ic-reductions

Merge branch 'main' into worktree-survey-agentic-reductions

8f86e72

update

5118ac3

update

1fb6e69

update

33436b2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(arxiv): survey paper on skill-based agentic coding for reductions#617

docs(arxiv): survey paper on skill-based agentic coding for reductions#617
GiggleLiu wants to merge 43 commits intomainfrom
worktree-survey-agentic-reductions

GiggleLiu commented Mar 12, 2026

Uh oh!

codecov bot commented Mar 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

GiggleLiu commented Mar 12, 2026

Summary

Key findings from Claude history data:

Paper structure:

Test plan

Uh oh!

codecov bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov bot commented Mar 12, 2026 •

edited

Loading