feat(daemon): integration tests and bug fixes for recovery paths by zhubert · Pull Request #435 · zhubert/erg

zhubert · 2026-03-17T16:56:55Z

Summary

Add daemon integration test suite exercising the full tick() loop with controlled mocks (16 tests covering happy path, failure modes, and recovery)
Add reusable FakeProvider test double implementing all 9 provider interfaces
Fix 3 bugs found during integration testing

Bugs Fixed

Terminal state step name lost — ProcessStep returned empty NewStep for succeed/fail states, causing CurrentStep to be cleared to "". The dashboard and erg status now correctly show "done"/"failed" instead of blank.

Docker recovery spawns duplicate workers — processIdleSyncItems didn't filter async actions (ai.code, ai.fix_ci, etc.), so after Docker recovery set phase=idle on a coding item, it re-executed the async action via executeSyncChain, creating a second session and worker for the same issue.

docker_pending items stuck forever — resumeDockerPendingItems only ran on the dockerDown→healthy transition. Transient Docker blips (worker fails but next health check passes) left items permanently stuck in docker_pending.

Test plan

go test -p=1 -count=1 ./... — all packages pass
goimports -w . / go vet ./... / staticcheck ./... — clean
16 integration tests cover: happy path (issue→merge), worker failure, max concurrency, dedup, external close, Docker error/recovery/down/transient blip, config save pause/recovery, retry delay elapsed/pending, idle sync recovery, feedback worker failure, duplicate worker prevention

🤖 Generated with Claude Code

…ing (#435) Add integration tests exercising the daemon's full tick() loop (poll → code → open_pr → await_ci → await_review → merge → done) with controlled mocks, catching integration bugs before they hit production. Fix a bug where terminal states (done/failed) had their CurrentStep cleared to "" because ProcessStep returned an empty NewStep for succeed/fail state types. The dashboard and status commands now correctly show the terminal step name. New files: - internal/issues/fake_provider.go — reusable test double implementing all 9 provider interfaces with call recording - internal/daemon/integration_test.go — 5 integration tests + 4 FakeProvider unit tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add 9 integration tests covering daemon recovery and resilience: - Docker error → docker_pending phase (transient, not terminal) - Docker recovery → resumes docker_pending items - Docker down → skips all processing (no polling, no starting) - Config save paused → blocks new work - Config save recovery → auto-resumes via retryConfigSave - Retry pending with elapsed delay → re-executes sync chain - Retry pending with future delay → respects delay, stays pending - Idle sync recovery → executes stale sync task after restart - Feedback worker failure → returns to idle (not terminal) Found and fixed: AddWorkItem forces Phase="idle" regardless of struct field, requiring UpdateWorkItem to set non-idle phases for test setup. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ck docker_pending items Two bugs found via integration testing: 1. processIdleSyncItems re-executed async actions (ai.code, ai.fix_ci, etc.) after Docker recovery. When resumeDockerPendingItems set phase="idle" on an item at step="coding", processIdleSyncItems found it as an idle task state and called executeSyncChain, which re-ran ai.code — spawning a duplicate worker and session for the same issue. Fix: skip actions prefixed with "ai." in processIdleSyncItems since they are async and require worker restarts, not sync chain execution. 2. docker_pending items stuck forever on transient Docker blips. If a worker failed with a Docker error but the health check passed on the next tick (Docker recovered between the failure and the check), d.dockerDown was never set to true, so resumeDockerPendingItems never ran. Fix: call resumeDockerPendingItems on every tick where Docker is healthy, not just on the dockerDown→healthy transition. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds a daemon integration test suite that exercises the full tick() loop end-to-end with mocked external dependencies, introduces a reusable FakeProvider test double for issue providers, and includes fixes to recovery-path bugs uncovered by the new tests.

Changes:

Add integration tests covering happy path, failure modes, and recovery behaviors across the daemon tick loop.
Introduce internal/issues/FakeProvider implementing provider interfaces for deterministic integration testing.
Fix workflow/daemon recovery behavior: preserve terminal step name, avoid duplicate async work after Docker recovery, and resume docker_pending items when Docker is healthy.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
internal/workflow/engine.go	Preserve `NewStep` for terminal states so daemon/dashboard don’t lose the terminal step name.
internal/workflow/engine_test.go	Add regression test ensuring terminal states keep their step name.
internal/issues/fake_provider.go	Add a configurable fake provider implementing multiple provider interfaces for tests.
internal/daemon/processing.go	Harden recovery paths: skip re-running async actions in idle sync recovery; resume `docker_pending` items whenever Docker is healthy.
internal/daemon/integration_test.go	Add a full `tick()` integration test suite with mocked git/gh behavior and worker completion simulation.

Comments suppressed due to low confidence (1)

internal/workflow/engine.go:99

ProcessStep() now preserves NewStep for succeed/fail states, but NewPhase is still left empty. Daemon.executeSyncChain/handleAsyncComplete call DaemonState.AdvanceWorkItem(result.NewStep, result.NewPhase), so an empty NewPhase will overwrite the persisted WorkItem.Phase with "" when a terminal state is processed. Consider setting NewPhase to item.Phase (or "idle") for terminal states so phase doesn’t get cleared as a side effect of reaching done/failed.

	switch state.Type {
	case StateTypeSucceed:
		return &StepResult{
			NewStep:    item.CurrentStep,
			Terminal:   true,
			TerminalOK: true,
			Hooks:      state.After,
		}, nil

	case StateTypeFail:
		return &StepResult{
			NewStep:    item.CurrentStep,
			Terminal:   true,
			TerminalOK: false,
			Hooks:      state.After,
		}, nil

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

… erasure Terminal states (succeed/fail) left NewPhase empty, causing AdvanceWorkItem to overwrite the persisted WorkItem.Phase with "" on terminal transitions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

zhubert and others added 3 commits March 17, 2026 09:09

Copilot AI review requested due to automatic review settings March 17, 2026 16:56

Copilot started reviewing on behalf of zhubert March 17, 2026 16:57 View session

Copilot AI reviewed Mar 17, 2026

View reviewed changes

zhubert merged commit 4651be5 into main Mar 17, 2026
1 check passed

zhubert deleted the feat/daemon-integration-tests branch March 17, 2026 21:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(daemon): integration tests and bug fixes for recovery paths#435

feat(daemon): integration tests and bug fixes for recovery paths#435
zhubert merged 4 commits intomainfrom
feat/daemon-integration-tests

zhubert commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhubert commented Mar 17, 2026

Summary

Bugs Fixed

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants