feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling by azalio · Pull Request #87 · azalio/map-framework

azalio · 2026-03-19T21:21:18Z

Summary

Implements Phase 1 (Cheap Wins) and Phase 2 (Core Execution Quality) from the autoresearch analysis.

Phase 1: Cheap Wins

Scenario dimensions in task-decomposer + map-tdd: happy_path, error, edge_case, security
Constraint schema in map-plan: max_files, max_subtasks, time_budget, scope_glob with validation
Iteration summary derivation in ralph-iteration-logger (iteration_summary.json)
Git-as-memory conditional context in actor ({{git_history}} for debug/retry/resume)

Phase 2: Core Execution Quality

Guard pattern decision table in map-efficient: monitor pass + guard fail → retry Actor (max 2) → escalate
Stuck recovery at monitor retry 3: research-agent → predictor → retry 4-5
Noise handling in final-verifier: 3x re-run, 2/3 majority, flaky_detected, confidence -= 0.1

Workflow gate rewrite

Source of truth changed from workflow_state.json (completed_steps) to step_state.json (phase)
Fixes chicken-and-egg: Actor agents blocked from Edit because gate required "actor" in completed_steps
Fixes parallel wave mode: checks subtask_phases dict per-subtask instead of disabling gate entirely
Adds constraint enforcement (scope_glob, time_budget) as hook-level deny
379 → 261 lines

Test plan

make check passes (ruff clean, mypy clean, 676 passed / 1 skipped)
make sync-templates — all .claude/ ↔ src/mapify_cli/templates/ in sync
Workflow gate tests rewritten for phase-based logic (24 tests)
monitor.md unchanged (2500 lines, within limit)
Manual: run /map-efficient on a test branch to verify guard pattern + stuck recovery in practice

…ints, and noise handling Phase 1 (Cheap Wins): - Add scenario_dimensions to task-decomposer test_strategy schema (happy_path, error, edge_case, security) - Add scenario dimensions requirement to map-tdd TEST_WRITER - Add constraint schema (max_files, max_subtasks, time_budget, scope_glob) to map-plan - Add scope_glob validation (reject .., absolute paths, brace expansion) - Add iteration_summary.json derivation to ralph-iteration-logger - Add conditional git history context ({{git_history}}) to actor for debug/retry/resume Phase 2 (Core Execution Quality): - Add guard pattern decision table to map-efficient: monitor pass + guard fail → retry Actor (max 2 rework) → escalate - Add exit code capture (TESTS_EXIT, LINT_EXIT) to TESTS_GATE/LINTER_GATE - Add stuck recovery at monitor retry 3: research-agent → predictor → retry 4-5 - Add intermediate recovery paths to escalation-matrix - Add flaky-aware noise handling to final-verifier (3x re-run, 2/3 majority, confidence -= 0.1) Workflow gate rewrite: - Rewrite workflow-gate.py from completed_steps (workflow_state.json) to phase-based enforcement (step_state.json) - Fix chicken-and-egg: Actor agents can now Edit during ACTOR/APPLY/TEST_WRITER phases - Fix parallel wave mode: check subtask_phases dict instead of disabling gate - Add constraint enforcement (scope_glob, time_budget) as hook-level deny - Remove max_files evidence scanning (too expensive per hook call) - Rewrite tests for new phase-based logic (24 passed) All checks pass: ruff clean, mypy clean, 676 passed / 1 skipped.

Copilot

Pull request overview

This PR updates the MAP Framework templates to improve workflow execution quality and reliability, including phase-based workflow gating (step_state.json), constraint enforcement, and several orchestration/agent protocol enhancements (guard pattern, stuck recovery, flaky/noise handling, scenario dimensions, and iteration summaries).

Changes:

Rewrite workflow gate enforcement to be phase-based (step_state.json) with added constraint checks (scope_glob, time_budget), plus rewritten tests.
Add iteration_summary.json derivation to the Ralph iteration logger.
Update MAP command/agent templates to incorporate scenario dimensions, guard/stuck recovery guidance, flaky-test handling, and conditional git history context.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
tests/test_workflow_gate.py	Rewritten tests for phase-based gate behavior, branch scoping, and constraint enforcement.
src/mapify_cli/templates/settings.json	Expands Claude Code permissions allowlist for editing `.claude/*` paths.
src/mapify_cli/templates/references/escalation-matrix.md	Documents stuck recovery at retry 3 and guard-pattern escalation policy.
src/mapify_cli/templates/map/scripts/map_step_runner.py	Adds/adjusts type annotations for issue JSON defaults.
src/mapify_cli/templates/hooks/workflow-gate.py	Replaces step-completion gating with phase-based gating + constraints.
src/mapify_cli/templates/hooks/ralph-iteration-logger.py	Adds best-effort iteration_summary.json derivation.
src/mapify_cli/templates/commands/map-tdd.md	Adds scenario-dimension coverage requirements to TDD phase instructions.
src/mapify_cli/templates/commands/map-plan.md	Adds constraints schema + validation guidance for scope_glob before state init.
src/mapify_cli/templates/commands/map-efficient.md	Adds stuck recovery pseudocode, guard pattern decision table, and captures exit codes for gates.
src/mapify_cli/templates/agents/task-decomposer.md	Extends required JSON schema with `test_strategy.scenario_dimensions`.
src/mapify_cli/templates/agents/final-verifier.md	Adds flaky/noise handling protocol and confidence adjustment rules.
src/mapify_cli/templates/agents/actor.md	Adds conditional `{{git_history}}` context consumption guidance.
.claude/settings.json	Mirrors template settings allowlist updates for `.claude/*` edits/writes.
.claude/references/escalation-matrix.md	Mirrors escalation matrix updates.
.claude/hooks/workflow-gate.py	Mirrors phase-based gating + constraints logic.
.claude/hooks/ralph-iteration-logger.py	Mirrors iteration_summary.json derivation.
.claude/commands/map-tdd.md	Mirrors scenario-dimension rules addition.
.claude/commands/map-plan.md	Mirrors constraints schema + validation steps addition.
.claude/commands/map-efficient.md	Mirrors stuck recovery + guard decision documentation updates.
.claude/agents/task-decomposer.md	Mirrors decomposer schema extension.
.claude/agents/final-verifier.md	Mirrors flaky/noise handling protocol updates.
.claude/agents/actor.md	Mirrors conditional git history guidance.

Comments suppressed due to low confidence (2)

.claude/hooks/workflow-gate.py:45

extract_target_file_paths only considers file_path (and edits[].file_path). Other hooks in this repo also accept tool_input['path']; if Claude Code sends path here, exemption and scope_glob checks won’t run for that tool call. Consider supporting both keys (and the same inside edits) to avoid enforcement bypass.

def extract_target_file_paths(tool_call: dict) -> list[str]:
    """Extract file paths from tool call payload."""
    tool_input = tool_call.get("tool_input") or {}
    if not isinstance(tool_input, dict):
        return []

    paths: list[str] = []

    direct = tool_input.get("file_path")
    if isinstance(direct, str) and direct.strip():
        paths.append(direct)

src/mapify_cli/templates/hooks/workflow-gate.py:45

extract_target_file_paths only considers file_path (and edits[].file_path). Other hooks in this repo also accept tool_input['path']; if Claude Code sends path here, exemption and scope_glob checks won’t run for that tool call. Consider supporting both keys (and the same inside edits) to avoid enforcement bypass.

def extract_target_file_paths(tool_call: dict) -> list[str]:
    """Extract file paths from tool call payload."""
    tool_input = tool_call.get("tool_input") or {}
    if not isinstance(tool_input, dict):
        return []

    paths: list[str] = []

    direct = tool_input.get("file_path")
    if isinstance(direct, str) and direct.strip():
        paths.append(direct)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-19T21:26:41Z

src/mapify_cli/templates/hooks/workflow-gate.py

+        for tp in target_paths:
+            try:
+                rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
+            except ValueError:
+                rel = tp


scope_glob enforcement can be bypassed via symlinks / paths that resolve outside the repo root. In the ValueError case you fall back to rel = tp, which can still match the glob even when Path(tp).resolve() is outside Path.cwd(), allowing writes outside the intended scope. Consider denying when the resolved path is not within the repo root (or computing rel from the resolved path only) so scope_glob cannot be circumvented.

Suggested change

for tp in target_paths:

try:

rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))

except ValueError:

rel = tp

repo_root = Path.cwd().resolve()

for tp in target_paths:

resolved = Path(tp).resolve()

try:

rel_path = resolved.relative_to(repo_root)

except ValueError:

# Path resolves outside the repository root: treat as out of scope

return (

f"Constraint: scope_glob='{scope_glob}'\n"

f"File '{tp}' resolves outside repository root."

)

rel = str(rel_path)

Copilot · 2026-03-19T21:26:41Z

.claude/commands/map-tdd.md

+8. Cover scenario dimensions from test_strategy: write tests for at minimum
+   happy_path, error, edge_case, and security dimensions (use "N/A" if not applicable).
+   Each dimension should have at least one dedicated test or test case.
 8. Test files MUST be lint-clean. Use proper imports at the top of the file


The STRICT RULES list now has two items numbered “8.” (scenario dimensions + lint-clean). Renumber the subsequent items to keep the instructions unambiguous.

Suggested change

8. Test files MUST be lint-clean. Use proper imports at the top of the file

9. Test files MUST be lint-clean. Use proper imports at the top of the file

Copilot · 2026-03-19T21:26:42Z

.claude/hooks/workflow-gate.py

+        for tp in target_paths:
+            try:
+                rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
+            except ValueError:
+                rel = tp


scope_glob enforcement can be bypassed via symlinks / paths that resolve outside the repo root. In the ValueError case you fall back to rel = tp, which can still match the glob even when Path(tp).resolve() is outside Path.cwd(), allowing writes outside the intended scope. Consider denying when the resolved path is not within the repo root (or computing rel from the resolved path only) so scope_glob cannot be circumvented.

Suggested change

for tp in target_paths:

try:

rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))

except ValueError:

rel = tp

repo_root = Path.cwd().resolve()

for tp in target_paths:

resolved = Path(tp).resolve()

try:

rel_path = resolved.relative_to(repo_root)

except ValueError:

# Resolved path is outside the repository root; treat as out of scope.

return (

f"Constraint: scope_glob='{scope_glob}'\n"

f"File '{resolved}' is outside allowed scope."

)

rel = str(rel_path)

Copilot · 2026-03-19T21:26:42Z

src/mapify_cli/templates/settings.json

+      "Edit(.claude/agents/*)",
+      "Edit(.claude/commands/*)",
+      "Edit(.claude/hooks/*)",
+      "Edit(.claude/references/*)",
+      "Write(.claude/agents/*)",
+      "Write(.claude/commands/*)",
+      "Write(.claude/hooks/*)",
+      "Write(.claude/references/*)",


Allowing Edit/Write on .claude/hooks/* means the agent can modify safety-critical hooks (e.g., safety-guardrails.py / workflow-gate.py) and potentially disable guardrails. Consider adding explicit deny entries for at least .claude/hooks/safety-guardrails.py (and any other security hooks), or avoid allowing .claude/hooks/* broadly. Also, the hook description below still refers to Actor+Monitor steps; it should be updated to reflect the new phase/step_state.json gating.

Copilot · 2026-03-19T21:26:42Z

src/mapify_cli/templates/commands/map-tdd.md

+8. Cover scenario dimensions from test_strategy: write tests for at minimum
+   happy_path, error, edge_case, and security dimensions (use "N/A" if not applicable).
+   Each dimension should have at least one dedicated test or test case.
 8. Test files MUST be lint-clean. Use proper imports at the top of the file


The STRICT RULES list now has two items numbered “8.” (scenario dimensions + lint-clean). Renumber the subsequent items to keep the instructions unambiguous.

Suggested change

8. Test files MUST be lint-clean. Use proper imports at the top of the file

9. Test files MUST be lint-clean. Use proper imports at the top of the file

Copilot · 2026-03-19T21:26:44Z

src/mapify_cli/templates/commands/map-efficient.md

+# Run tests if available (do NOT install dependencies). Capture exit code for guard pattern.
+TESTS_EXIT=0
 if [ -f "pytest.ini" ] || [ -f "setup.py" ]; then
-  pytest
+  pytest; TESTS_EXIT=$?
 elif [ -f "package.json" ]; then
-  npm test
+  npm test; TESTS_EXIT=$?
 elif [ -f "go.mod" ]; then
-  go test ./...
+  go test ./...; TESTS_EXIT=$?
 elif [ -f "Cargo.toml" ]; then
-  cargo test
+  cargo test; TESTS_EXIT=$?


The guard decision table says to pass “test/lint stderr output” to the Actor on guard failures, but the TESTS_GATE/LINTER_GATE snippets only capture exit codes. Consider also capturing the command output (e.g., TEST_OUTPUT=$(pytest ... 2>&1); TESTS_EXIT=$? and similarly for lint) so the orchestrator actually has failure context to pass into the retry.

Copilot · 2026-03-19T21:26:44Z

src/mapify_cli/templates/hooks/ralph-iteration-logger.py

+    lines = log_file.read_text(encoding="utf-8").strip().split("\n")
+    entries = []
+    for line in lines:
+        if line.strip():
+            try:
+                entries.append(json.loads(line))
+            except json.JSONDecodeError:
+                continue
+
+    total_entries_seen = len(entries)
+    if total_entries_seen == 0:
+        return
+
+    # Truncate to last 50 if > 100 entries
+    dropped_count = 0
+    if total_entries_seen > 100:
+        dropped_count = total_entries_seen - 50
+        entries = entries[-50:]
+


derive_summary reads and parses the entire iteration_log.jsonl on every hook invocation. As the log grows, this becomes O(N) per tool call (and effectively O(N²) over a session), which can violate the “fast hook” expectation. Consider streaming and only keeping the last ~100 lines (and separately counting total lines) rather than read_text().split().

Copilot · 2026-03-19T21:26:44Z

.claude/hooks/ralph-iteration-logger.py

+    lines = log_file.read_text(encoding="utf-8").strip().split("\n")
+    entries = []
+    for line in lines:
+        if line.strip():
+            try:
+                entries.append(json.loads(line))
+            except json.JSONDecodeError:
+                continue
+
+    total_entries_seen = len(entries)
+    if total_entries_seen == 0:
+        return
+
+    # Truncate to last 50 if > 100 entries
+    dropped_count = 0
+    if total_entries_seen > 100:
+        dropped_count = total_entries_seen - 50
+        entries = entries[-50:]
+


derive_summary reads and parses the entire iteration_log.jsonl on every hook invocation. As the log grows, this becomes O(N) per tool call (and effectively O(N²) over a session), which can violate the “fast hook” expectation. Consider streaming and only keeping the last ~100 lines (and separately counting total lines) rather than read_text().split().

Copilot · 2026-03-19T21:26:44Z

src/mapify_cli/templates/hooks/ralph-iteration-logger.py

+    file_data: dict[str, list[float]] = {}
+    file_thrashing: Counter[str] = Counter()
+    for entry in entries:
+        f = entry.get("file", "unknown")


derive_summary aggregates per-file stats but includes entries with file == "" (all non-Edit/Write tools log an empty file path). This will skew file_stats and thrashing counters toward an empty filename. Consider skipping entries without a non-empty file (and/or only including Edit/Write tools) when building per-file aggregates.

Suggested change

f = entry.get("file", "unknown")

f = entry.get("file")

# Skip entries without a concrete file path (e.g., non-Edit/Write tools)

if not f:

continue

Copilot · 2026-03-19T21:26:45Z

src/mapify_cli/templates/map/scripts/map_step_runner.py

+KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []}
+ACTIVE_ISSUES_DEFAULT: dict[str, str | list[str]] = {"updated_at": "", "issues": []}


The new type annotations don’t match the actual JSON structure: KNOWN_ISSUES_DEFAULT/ACTIVE_ISSUES_DEFAULT issues is later treated as a list of dicts, not list[str]. Please adjust the annotations (e.g., list[dict[str, str]] / list[dict[str, object]]) or widen them to avoid misleading types.

Suggested change

KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []}

ACTIVE_ISSUES_DEFAULT: dict[str, str | list[str]] = {"updated_at": "", "issues": []}

KNOWN_ISSUES_DEFAULT: dict[str, list[dict[str, object]]] = {"issues": []}

ACTIVE_ISSUES_DEFAULT: dict[str, object] = {"updated_at": "", "issues": []}

- workflow-gate: deny paths resolving outside repo root (symlink bypass) - workflow-gate: remove broad .claude/hooks/* Edit/Write permission - map-efficient: capture test/lint output ($TEST_OUTPUT, $LINT_OUTPUT) not just exit codes - map-tdd: fix duplicate rule numbering (8 → 8, 9) - ralph-iteration-logger: stream last 100 lines via deque (O(1) memory), skip empty file entries - map_step_runner: fix type annotations to match actual JSON structure

research.md duplicated information already present in findings_<branch>.md and spec_<branch>.md. Removed from: map-plan command (preflight, discovery, checkpoint, distillation checklist), __init__.py (artifact list + template), README.md, and test assertion.

Design doc for major map-efficient pipeline rewrite: 11 phases → 2-3 per subtask. Key decisions: - Remove: XML_PACKET, CONTEXT_SEARCH, evidence files, VERIFY_ADHERENCE, workflow_state.json, session-log.md, devlog-XXX.md - Per-wave gates instead of per-subtask (with isolation strategy for multi-subtask interaction failures) - Predictor only in stuck recovery (retry 3), not a regular phase - State lifecycle: MONITOR_PASSED != done; wave advances only after gates pass - New step_state.json schema: active_phase, workflow_status, subtask_files_changed, wave_states - Full migration surface: ~50 files Reviewed: 3 rounds, all P1/P2 resolved (guard isolation, state lifecycle, hook contracts, active_phase enum, workflow_status field).

Copilot AI review requested due to automatic review settings March 19, 2026 21:21

Copilot started reviewing on behalf of azalio March 19, 2026 21:21 View session

Copilot AI reviewed Mar 19, 2026

View reviewed changes

azalio added 3 commits March 20, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling#87

feat: guard pattern, stuck recovery, scenario dimensions, constraints, noise handling#87
azalio wants to merge 4 commits intomainfrom
autoresearch

azalio commented Mar 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Copilot AI Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        for tp in target_paths:
-            try:
-                rel = str(Path(tp).resolve().relative_to(Path.cwd().resolve()))
-            except ValueError:
-                rel = tp
+        repo_root = Path.cwd().resolve()
+        for tp in target_paths:
+            resolved = Path(tp).resolve()
+            try:
+                rel_path = resolved.relative_to(repo_root)
+            except ValueError:
+                # Path resolves outside the repository root: treat as out of scope
+                return (
+                    f"Constraint: scope_glob='{scope_glob}'\n"
+                    f"File '{tp}' resolves outside repository root."
+                )
+            rel = str(rel_path)

	8. Test files MUST be lint-clean. Use proper imports at the top of the file
	9. Test files MUST be lint-clean. Use proper imports at the top of the file

-        f = entry.get("file", "unknown")
+        f = entry.get("file")
+        # Skip entries without a concrete file path (e.g., non-Edit/Write tools)
+        if not f:
+            continue

		KNOWN_ISSUES_DEFAULT: dict[str, list[str]] = {"issues": []}
		ACTIVE_ISSUES_DEFAULT: dict[str, str \| list[str]] = {"updated_at": "", "issues": []}

Conversation

azalio commented Mar 19, 2026

Summary

Phase 1: Cheap Wins

Phase 2: Core Execution Quality

Workflow gate rewrite

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants