Skip to content

feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling#87

Open
azalio wants to merge 4 commits intomainfrom
autoresearch
Open

feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling#87
azalio wants to merge 4 commits intomainfrom
autoresearch

Conversation

@azalio
Copy link
Owner

@azalio azalio commented Mar 19, 2026

Summary

Implements Phase 1 (Cheap Wins) and Phase 2 (Core Execution Quality) from the autoresearch analysis.

Phase 1: Cheap Wins

  • Scenario dimensions in task-decomposer + map-tdd: happy_path, error, edge_case, security
  • Constraint schema in map-plan: max_files, max_subtasks, time_budget, scope_glob with validation
  • Iteration summary derivation in ralph-iteration-logger (iteration_summary.json)
  • Git-as-memory conditional context in actor ({{git_history}} for debug/retry/resume)

Phase 2: Core Execution Quality

  • Guard pattern decision table in map-efficient: monitor pass + guard fail → retry Actor (max 2) → escalate
  • Stuck recovery at monitor retry 3: research-agent → predictor → retry 4-5
  • Noise handling in final-verifier: 3x re-run, 2/3 majority, flaky_detected, confidence -= 0.1

Workflow gate rewrite

  • Source of truth changed from workflow_state.json (completed_steps) to step_state.json (phase)
  • Fixes chicken-and-egg: Actor agents blocked from Edit because gate required "actor" in completed_steps
  • Fixes parallel wave mode: checks subtask_phases dict per-subtask instead of disabling gate entirely
  • Adds constraint enforcement (scope_glob, time_budget) as hook-level deny
  • 379 → 261 lines

Test plan

  • make check passes (ruff clean, mypy clean, 676 passed / 1 skipped)
  • make sync-templates — all .claude/src/mapify_cli/templates/ in sync
  • Workflow gate tests rewritten for phase-based logic (24 tests)
  • monitor.md unchanged (2500 lines, within limit)
  • Manual: run /map-efficient on a test branch to verify guard pattern + stuck recovery in practice

…ints, and noise handling

Phase 1 (Cheap Wins):
- Add scenario_dimensions to task-decomposer test_strategy schema (happy_path, error, edge_case, security)
- Add scenario dimensions requirement to map-tdd TEST_WRITER
- Add constraint schema (max_files, max_subtasks, time_budget, scope_glob) to map-plan
- Add scope_glob validation (reject .., absolute paths, brace expansion)
- Add iteration_summary.json derivation to ralph-iteration-logger
- Add conditional git history context ({{git_history}}) to actor for debug/retry/resume

Phase 2 (Core Execution Quality):
- Add guard pattern decision table to map-efficient: monitor pass + guard fail → retry Actor (max 2 rework) → escalate
- Add exit code capture (TESTS_EXIT, LINT_EXIT) to TESTS_GATE/LINTER_GATE
- Add stuck recovery at monitor retry 3: research-agent → predictor → retry 4-5
- Add intermediate recovery paths to escalation-matrix
- Add flaky-aware noise handling to final-verifier (3x re-run, 2/3 majority, confidence -= 0.1)

Workflow gate rewrite:
- Rewrite workflow-gate.py from completed_steps (workflow_state.json) to phase-based enforcement (step_state.json)
- Fix chicken-and-egg: Actor agents can now Edit during ACTOR/APPLY/TEST_WRITER phases
- Fix parallel wave mode: check subtask_phases dict instead of disabling gate
- Add constraint enforcement (scope_glob, time_budget) as hook-level deny
- Remove max_files evidence scanning (too expensive per hook call)
- Rewrite tests for new phase-based logic (24 passed)

All checks pass: ruff clean, mypy clean, 676 passed / 1 skipped.
Copilot AI review requested due to automatic review settings March 19, 2026 21:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the MAP Framework templates to improve workflow execution quality and reliability, including phase-based workflow gating (step_state.json), constraint enforcement, and several orchestration/agent protocol enhancements (guard pattern, stuck recovery, flaky/noise handling, scenario dimensions, and iteration summaries).

Changes:

  • Rewrite workflow gate enforcement to be phase-based (step_state.json) with added constraint checks (scope_glob, time_budget), plus rewritten tests.
  • Add iteration_summary.json derivation to the Ralph iteration logger.
  • Update MAP command/agent templates to incorporate scenario dimensions, guard/stuck recovery guidance, flaky-test handling, and conditional git history context.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
tests/test_workflow_gate.py Rewritten tests for phase-based gate behavior, branch scoping, and constraint enforcement.
src/mapify_cli/templates/settings.json Expands Claude Code permissions allowlist for editing .claude/* paths.
src/mapify_cli/templates/references/escalation-matrix.md Documents stuck recovery at retry 3 and guard-pattern escalation policy.
src/mapify_cli/templates/map/scripts/map_step_runner.py Adds/adjusts type annotations for issue JSON defaults.
src/mapify_cli/templates/hooks/workflow-gate.py Replaces step-completion gating with phase-based gating + constraints.
src/mapify_cli/templates/hooks/ralph-iteration-logger.py Adds best-effort iteration_summary.json derivation.
src/mapify_cli/templates/commands/map-tdd.md Adds scenario-dimension coverage requirements to TDD phase instructions.
src/mapify_cli/templates/commands/map-plan.md Adds constraints schema + validation guidance for scope_glob before state init.
src/mapify_cli/templates/commands/map-efficient.md Adds stuck recovery pseudocode, guard pattern decision table, and captures exit codes for gates.
src/mapify_cli/templates/agents/task-decomposer.md Extends required JSON schema with test_strategy.scenario_dimensions.
src/mapify_cli/templates/agents/final-verifier.md Adds flaky/noise handling protocol and confidence adjustment rules.
src/mapify_cli/templates/agents/actor.md Adds conditional {{git_history}} context consumption guidance.
.claude/settings.json Mirrors template settings allowlist updates for .claude/* edits/writes.
.claude/references/escalation-matrix.md Mirrors escalation matrix updates.
.claude/hooks/workflow-gate.py Mirrors phase-based gating + constraints logic.
.claude/hooks/ralph-iteration-logger.py Mirrors iteration_summary.json derivation.
.claude/commands/map-tdd.md Mirrors scenario-dimension rules addition.
.claude/commands/map-plan.md Mirrors constraints schema + validation steps addition.
.claude/commands/map-efficient.md Mirrors stuck recovery + guard decision documentation updates.
.claude/agents/task-decomposer.md Mirrors decomposer schema extension.
.claude/agents/final-verifier.md Mirrors flaky/noise handling protocol updates.
.claude/agents/actor.md Mirrors conditional git history guidance.
Comments suppressed due to low confidence (2)

.claude/hooks/workflow-gate.py:45

  • extract_target_file_paths only considers file_path (and edits[].file_path). Other hooks in this repo also accept tool_input['path']; if Claude Code sends path here, exemption and scope_glob checks won’t run for that tool call. Consider supporting both keys (and the same inside edits) to avoid enforcement bypass.
def extract_target_file_paths(tool_call: dict) -> list[str]:
    """Extract file paths from tool call payload."""
    tool_input = tool_call.get("tool_input") or {}
    if not isinstance(tool_input, dict):
        return []

    paths: list[str] = []

    direct = tool_input.get("file_path")
    if isinstance(direct, str) and direct.strip():
        paths.append(direct)

src/mapify_cli/templates/hooks/workflow-gate.py:45

  • extract_target_file_paths only considers file_path (and edits[].file_path). Other hooks in this repo also accept tool_input['path']; if Claude Code sends path here, exemption and scope_glob checks won’t run for that tool call. Consider supporting both keys (and the same inside edits) to avoid enforcement bypass.
def extract_target_file_paths(tool_call: dict) -> list[str]:
    """Extract file paths from tool call payload."""
    tool_input = tool_call.get("tool_input") or {}
    if not isinstance(tool_input, dict):
        return []

    paths: list[str] = []

    direct = tool_input.get("file_path")
    if isinstance(direct, str) and direct.strip():
        paths.append(direct)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +172 to +176
for tp in target_paths:
try:
rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
except ValueError:
rel = tp
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope_glob enforcement can be bypassed via symlinks / paths that resolve outside the repo root. In the ValueError case you fall back to rel = tp, which can still match the glob even when Path(tp).resolve() is outside Path.cwd(), allowing writes outside the intended scope. Consider denying when the resolved path is not within the repo root (or computing rel from the resolved path only) so scope_glob cannot be circumvented.

Suggested change
for tp in target_paths:
try:
rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
except ValueError:
rel = tp
repo_root = Path.cwd().resolve()
for tp in target_paths:
resolved = Path(tp).resolve()
try:
rel_path = resolved.relative_to(repo_root)
except ValueError:
# Path resolves outside the repository root: treat as out of scope
return (
f"Constraint: scope_glob='{scope_glob}'\n"
f"File '{tp}' resolves outside repository root."
)
rel = str(rel_path)

Copilot uses AI. Check for mistakes.
8. Cover scenario dimensions from test_strategy: write tests for at minimum
happy_path, error, edge_case, and security dimensions (use "N/A" if not applicable).
Each dimension should have at least one dedicated test or test case.
8. Test files MUST be lint-clean. Use proper imports at the top of the file
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STRICT RULES list now has two items numbered “8.” (scenario dimensions + lint-clean). Renumber the subsequent items to keep the instructions unambiguous.

Suggested change
8. Test files MUST be lint-clean. Use proper imports at the top of the file
9. Test files MUST be lint-clean. Use proper imports at the top of the file

Copilot uses AI. Check for mistakes.
Comment on lines +172 to +176
for tp in target_paths:
try:
rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
except ValueError:
rel = tp
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scope_glob enforcement can be bypassed via symlinks / paths that resolve outside the repo root. In the ValueError case you fall back to rel = tp, which can still match the glob even when Path(tp).resolve() is outside Path.cwd(), allowing writes outside the intended scope. Consider denying when the resolved path is not within the repo root (or computing rel from the resolved path only) so scope_glob cannot be circumvented.

Suggested change
for tp in target_paths:
try:
rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
except ValueError:
rel = tp
repo_root = Path.cwd().resolve()
for tp in target_paths:
resolved = Path(tp).resolve()
try:
rel_path = resolved.relative_to(repo_root)
except ValueError:
# Resolved path is outside the repository root; treat as out of scope.
return (
f"Constraint: scope_glob='{scope_glob}'\n"
f"File '{resolved}' is outside allowed scope."
)
rel = str(rel_path)

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +25
"Edit(.claude/agents/*)",
"Edit(.claude/commands/*)",
"Edit(.claude/hooks/*)",
"Edit(.claude/references/*)",
"Write(.claude/agents/*)",
"Write(.claude/commands/*)",
"Write(.claude/hooks/*)",
"Write(.claude/references/*)",
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing Edit/Write on .claude/hooks/* means the agent can modify safety-critical hooks (e.g., safety-guardrails.py / workflow-gate.py) and potentially disable guardrails. Consider adding explicit deny entries for at least .claude/hooks/safety-guardrails.py (and any other security hooks), or avoid allowing .claude/hooks/* broadly. Also, the hook description below still refers to Actor+Monitor steps; it should be updated to reflect the new phase/step_state.json gating.

Copilot uses AI. Check for mistakes.
8. Cover scenario dimensions from test_strategy: write tests for at minimum
happy_path, error, edge_case, and security dimensions (use "N/A" if not applicable).
Each dimension should have at least one dedicated test or test case.
8. Test files MUST be lint-clean. Use proper imports at the top of the file
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The STRICT RULES list now has two items numbered “8.” (scenario dimensions + lint-clean). Renumber the subsequent items to keep the instructions unambiguous.

Suggested change
8. Test files MUST be lint-clean. Use proper imports at the top of the file
9. Test files MUST be lint-clean. Use proper imports at the top of the file

Copilot uses AI. Check for mistakes.
Comment on lines +662 to +671
# Run tests if available (do NOT install dependencies). Capture exit code for guard pattern.
TESTS_EXIT=0
if [ -f "pytest.ini" ] || [ -f "setup.py" ]; then
pytest
pytest; TESTS_EXIT=$?
elif [ -f "package.json" ]; then
npm test
npm test; TESTS_EXIT=$?
elif [ -f "go.mod" ]; then
go test ./...
go test ./...; TESTS_EXIT=$?
elif [ -f "Cargo.toml" ]; then
cargo test
cargo test; TESTS_EXIT=$?
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The guard decision table says to pass “test/lint stderr output” to the Actor on guard failures, but the TESTS_GATE/LINTER_GATE snippets only capture exit codes. Consider also capturing the command output (e.g., TEST_OUTPUT=$(pytest ... 2>&1); TESTS_EXIT=$? and similarly for lint) so the orchestrator actually has failure context to pass into the retry.

Copilot uses AI. Check for mistakes.
Comment on lines +326 to +344
lines = log_file.read_text(encoding="utf-8").strip().split("\n")
entries = []
for line in lines:
if line.strip():
try:
entries.append(json.loads(line))
except json.JSONDecodeError:
continue

total_entries_seen = len(entries)
if total_entries_seen == 0:
return

# Truncate to last 50 if > 100 entries
dropped_count = 0
if total_entries_seen > 100:
dropped_count = total_entries_seen - 50
entries = entries[-50:]

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derive_summary reads and parses the entire iteration_log.jsonl on every hook invocation. As the log grows, this becomes O(N) per tool call (and effectively O(N²) over a session), which can violate the “fast hook” expectation. Consider streaming and only keeping the last ~100 lines (and separately counting total lines) rather than read_text().split().

Copilot uses AI. Check for mistakes.
Comment on lines +326 to +344
lines = log_file.read_text(encoding="utf-8").strip().split("\n")
entries = []
for line in lines:
if line.strip():
try:
entries.append(json.loads(line))
except json.JSONDecodeError:
continue

total_entries_seen = len(entries)
if total_entries_seen == 0:
return

# Truncate to last 50 if > 100 entries
dropped_count = 0
if total_entries_seen > 100:
dropped_count = total_entries_seen - 50
entries = entries[-50:]

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derive_summary reads and parses the entire iteration_log.jsonl on every hook invocation. As the log grows, this becomes O(N) per tool call (and effectively O(N²) over a session), which can violate the “fast hook” expectation. Consider streaming and only keeping the last ~100 lines (and separately counting total lines) rather than read_text().split().

Copilot uses AI. Check for mistakes.
file_data: dict[str, list[float]] = {}
file_thrashing: Counter[str] = Counter()
for entry in entries:
f = entry.get("file", "unknown")
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

derive_summary aggregates per-file stats but includes entries with file == "" (all non-Edit/Write tools log an empty file path). This will skew file_stats and thrashing counters toward an empty filename. Consider skipping entries without a non-empty file (and/or only including Edit/Write tools) when building per-file aggregates.

Suggested change
f = entry.get("file", "unknown")
f = entry.get("file")
# Skip entries without a concrete file path (e.g., non-Edit/Write tools)
if not f:
continue

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +46
KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []}
ACTIVE_ISSUES_DEFAULT: dict[str, str | list[str]] = {"updated_at": "", "issues": []}
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new type annotations don’t match the actual JSON structure: KNOWN_ISSUES_DEFAULT/ACTIVE_ISSUES_DEFAULT issues is later treated as a list of dicts, not list[str]. Please adjust the annotations (e.g., list[dict[str, str]] / list[dict[str, object]]) or widen them to avoid misleading types.

Suggested change
KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []}
ACTIVE_ISSUES_DEFAULT: dict[str, str | list[str]] = {"updated_at": "", "issues": []}
KNOWN_ISSUES_DEFAULT: dict[str, list[dict[str, object]]] = {"issues": []}
ACTIVE_ISSUES_DEFAULT: dict[str, object] = {"updated_at": "", "issues": []}

Copilot uses AI. Check for mistakes.
azalio added 3 commits March 20, 2026 00:42
- workflow-gate: deny paths resolving outside repo root (symlink bypass)
- workflow-gate: remove broad .claude/hooks/* Edit/Write permission
- map-efficient: capture test/lint output ($TEST_OUTPUT, $LINT_OUTPUT) not just exit codes
- map-tdd: fix duplicate rule numbering (8 → 8, 9)
- ralph-iteration-logger: stream last 100 lines via deque (O(1) memory), skip empty file entries
- map_step_runner: fix type annotations to match actual JSON structure
research.md duplicated information already present in findings_<branch>.md
and spec_<branch>.md. Removed from: map-plan command (preflight, discovery,
checkpoint, distillation checklist), __init__.py (artifact list + template),
README.md, and test assertion.
Design doc for major map-efficient pipeline rewrite: 11 phases → 2-3 per subtask.

Key decisions:
- Remove: XML_PACKET, CONTEXT_SEARCH, evidence files, VERIFY_ADHERENCE,
  workflow_state.json, session-log.md, devlog-XXX.md
- Per-wave gates instead of per-subtask (with isolation strategy for
  multi-subtask interaction failures)
- Predictor only in stuck recovery (retry 3), not a regular phase
- State lifecycle: MONITOR_PASSED != done; wave advances only after gates pass
- New step_state.json schema: active_phase, workflow_status,
  subtask_files_changed, wave_states
- Full migration surface: ~50 files

Reviewed: 3 rounds, all P1/P2 resolved (guard isolation, state lifecycle,
hook contracts, active_phase enum, workflow_status field).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants