Summary
When a Codex task process crashes or is killed externally, its job record remains in status: "running" permanently. All subsequent task calls in the same Claude session are rejected with:
Task {job-id} is still running. Use /codex:status before continuing it.
This effectively blocks all Codex usage in that session until the user manually fixes the state file.
Steps to Reproduce
- Start a Codex task via
/codex:rescue (foreground or background)
- The task process crashes or is killed (e.g., OOM, signal, Codex app-server timeout)
- Job record in
state.json remains status: "running" with a dead PID
- All subsequent
task calls in the same session fail immediately with "Task is still running"
Root Cause
In codex-companion.mjs, resolveLatestTrackedTaskThread() checks for active tasks:
const activeTask = visibleJobs.find(
(job) => job.jobClass === "task" &&
(job.status === "queued" || job.status === "running")
);
if (activeTask) {
throw new Error(`Task ${activeTask.id} is still running.`);
}
This only checks the status field in the job record. It does not verify whether the process (stored in job.pid) is actually alive.
Evidence from Production
Observed in plugin version 1.0.3. A session had 6 consecutive failed task attempts, all blocked by the same zombie job:
task-mntm7f2j status=running pid=82311 (process DEAD)
task-mntmbrox status=failed error="Task task-mntm7f2j is still running"
task-mntmbvx4 status=failed error="Task task-mntm7f2j is still running"
task-mntmclwl status=failed error="Task task-mntm7f2j is still running"
task-mntme0qz status=failed error="Task task-mntm7f2j is still running"
task-mntmg2an status=failed error="Task task-mntm7f2j is still running"
task-mntnft5z status=failed error="Task task-mntm7f2j is still running"
The PID was confirmed dead via kill(82311, 0) → ProcessLookupError.
Suggested Fix
In resolveLatestTrackedTaskThread(), before throwing, verify the PID is alive. If the process is dead, mark the job as failed and continue:
const activeTask = visibleJobs.find(
(job) => job.jobClass === "task" &&
(job.status === "queued" || job.status === "running")
);
if (activeTask) {
// Check if the process is actually alive
const pid = activeTask.pid;
let processAlive = false;
if (pid) {
try {
process.kill(pid, 0);
processAlive = true;
} catch (e) {
processAlive = (e.code === 'EPERM'); // exists but no permission
}
}
if (processAlive) {
throw new Error(`Task ${activeTask.id} is still running.`);
}
// Process is dead — mark as failed (zombie cleanup)
upsertJob(workspaceRoot, {
id: activeTask.id,
status: "failed",
phase: "failed",
pid: null,
errorMessage: "Process exited without updating job status.",
completedAt: new Date().toISOString()
});
}
Environment
- Plugin version: 1.0.3
- Codex CLI:
@openai/codex (latest)
- Claude Code: 2.1.89
- Platform: macOS (Darwin 24.6.0, arm64)
- Node.js: v22.12.0
Summary
When a Codex task process crashes or is killed externally, its job record remains in
status: "running"permanently. All subsequenttaskcalls in the same Claude session are rejected with:This effectively blocks all Codex usage in that session until the user manually fixes the state file.
Steps to Reproduce
/codex:rescue(foreground or background)state.jsonremainsstatus: "running"with a dead PIDtaskcalls in the same session fail immediately with "Task is still running"Root Cause
In
codex-companion.mjs,resolveLatestTrackedTaskThread()checks for active tasks:This only checks the
statusfield in the job record. It does not verify whether the process (stored injob.pid) is actually alive.Evidence from Production
Observed in plugin version 1.0.3. A session had 6 consecutive failed task attempts, all blocked by the same zombie job:
The PID was confirmed dead via
kill(82311, 0)→ProcessLookupError.Suggested Fix
In
resolveLatestTrackedTaskThread(), before throwing, verify the PID is alive. If the process is dead, mark the job as failed and continue:Environment
@openai/codex(latest)