docs(arxiv): survey paper on skill-based agentic coding for reductions#617
Open
docs(arxiv): survey paper on skill-based agentic coding for reductions#617
Conversation
…ductions Design spec for a full research paper (ICSE/ASE-class) on using skill-based AI agent pipelines to build verified NP-hard problem reduction libraries. Key decisions from brainstorming: - Methodology-first framing (Goldilocks domain + practical artifact) - Three roles: contributors (issues), maintainer (board curation), agents (manage + execute) - Multi-layered verification stack (7 layers from type system to documentation) - Evaluation: ablation (skill vs no-skill) + git mining + 3 case studies - Hardware solver motivation (Rydberg atoms, D-Wave) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add the two new card-based orchestration skills from origin/main: - project-pipeline: picks Ready cards, runs issue-to-pr in worktrees - review-pipeline: fixes Copilot comments, runs agentic tests, moves to In Review Updated S4.3 with the two-stage pipeline and explicit human touch points (Backlog→Ready and In Review→Done). Skills count updated to 13. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
16 tasks in 5 parallelizable chunks: scaffolding, figures, sections S1-S4, sections S5-S6 with git mining, sections S7-S8 + final assembly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch figure generation from TikZ to Typst+CeTZ compiled to PDF, included in LaTeX via \includegraphics. Paper body remains LaTeX (IEEEtran class). Removed TikZ packages from preamble. Updated all figure tasks (3-6), conventions block, compile commands, and Task 17 assembly step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Set up paper.tex with IEEEtran conference class, 8 section stubs, and a ~150-word abstract. Combined survey bibliography (22 entries) with 6 foundational references (Karp, Cook, Garey-Johnson, Glover, Lucas, Barahona). Removed old paper.typ placeholder. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ~800 words covering four paragraphs: self-contained/verifiable reductions (LOC stats from graph-metrics.json), homogeneous task structure vs SWE-Bench, hardware solver compilation layer (Rydberg atoms for MIS, D-Wave for QUBO), and real-world applications. Fix figure caption to use accurate counts (40 impl + 12 inferred edges). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three subsections: S4.1 Three Roles (table + prose describing Contributor/Maintainer/Agent responsibilities), S4.2 Skills as Agent Functions (Table 1 with all 13 skills across 5 categories, detailed paragraphs per category), and S4.3 Card-Based Orchestration (two-stage pipeline with human touch points). Success rate column uses TBD placeholder pending Task 11 git mining results. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add three evaluation subsections: S6.1 ablation study design (skill-based vs raw agent, with TBD results), S6.2 git history mining (58 PRs across 3 phases, error taxonomy table with TBD counts), and S6.3 case studies of MVC->MIS (96 LOC, simple complement), SAT->MIS (171 LOC, quadratic gadget), and Factoring->CircuitSAT->ILP (272+225 LOC, composition). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix critical issues identified by simulated peer review: - Fix timeline contradiction: "six months" -> "seven weeks" (abstract + discussion) - Fix author count: "two primary contributors" -> "three contributors" - Soften unsubstantiated "60% of errors" claim to qualitative language - Add agent platform identification (Claude Code, model versions) - Reframe unexecuted ablation as experimental design, not pending results - Add skills vs. prompt engineering differentiation paragraph - Fix malformed BibTeX entries (dual booktitle/journal fields) - Add Pichler 2018 citation for Rydberg atom MIS connection - Note vendor report status on Anthropic 2026 citation - Soften Table 2/3 captions to acknowledge pending data Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix overfull hboxes in skills and error taxonomy tables by removing
outer padding (@{}) and abbreviating long skill names. Add .gitignore
for LaTeX build artifacts. All figures compile, cross-references
verified, no undefined citations.
Note: paper is 15 pages (over the 10-12 target).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Mine ~/.claude session data: 283 sessions, 300MB transcripts, 15:1 automation ratio, 1510 co-authored commits - Add development metrics paragraph with codebase growth timeline - Add issue quality gate data: 75% rejection rate on 322 checked issues - Add interaction evolution paragraph (imperative → declarative prompts) - Update counts: 24→27 models, 40→50 rules, 58→59 PRs, 7→9 weeks - Remove meta-power skill references (13→12 skills) - Replace Figure 1 with three-layer problemtree (from NSFC proposal) - Add future directions: reduction compiler with Pareto cost models - Save raw Claude history data to survey/claude-history-data.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #617 +/- ##
=======================================
Coverage 96.86% 96.86%
=======================================
Files 264 264
Lines 35196 35196
=======================================
Hits 34091 34091
Misses 1105 1105 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Restructures the paper around the "bridge problem" concept — software too large for humans, made possible by agents constrained through systematic verification. Three barriers (convention drift, effort exhaustion, knowledge discontinuity) become the central thesis. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Rewrite abstract and introduction with bridge problem concept - Add new Section 2 (Bridge Problems): definition, three barriers, verification constrains agent output, other candidate domains - Rename Section 3 to "Case Study: The Reduction Graph" - Rewrite Discussion: remove content now in Sec 2, tighten "Why Human Experts Remain Essential", rewrite conclusion - Move topology figure to appendix - Renumber sections throughout Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…line Uncomment figure placeholders and add timeline figure in Evidence section. Figures will be compiled from .typ sources in a following commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- scaling-wall: hero figure showing 3 barriers human teams hit - verification-funnel: how verification constrains agent output - timeline: cumulative growth over 9 weeks with phase bands Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The comparison against our own prior project weakens the argument. The bridge problem thesis stands on its own through the three structural barriers, not through a self-comparison. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The reduction graph is the most informative visual — real data, not a conceptual sketch. Lead with what was built. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three agent roles with skill mappings: - Mentor (4): propose, fix-issue, final-review, dev-setup - Orchestrator (5): project-pipeline, review-pipeline, issue-to-pr, check-issue, topology-sanity-check - Runner (7): add-model, add-rule, fix-pr, review-implementation, write-model-in-paper, write-rule-in-paper, release Replaces the old "two roles" (guides + runners) text with the three-role taxonomy and per-role TikZ diagrams. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reframe the agent taxonomy around knowledge asymmetry: Mentors guide humans with superior project knowledge; Workers execute routine heavy-lifting with less domain knowledge. Merges Orchestrator+Runner into Worker with lightweight subcategories. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
~/.claude)Key findings from Claude history data:
Paper structure:
Test plan
cd docs/paper/arxiv && pdflatex paper.tex🤖 Generated with Claude Code