Zombie running job blocks all subsequent task calls when process crashes

## Summary

When a Codex task process crashes or is killed externally, its job record remains in `status: "running"` permanently. All subsequent `task` calls in the same Claude session are rejected with:

> Task {job-id} is still running. Use /codex:status before continuing it.

This effectively blocks all Codex usage in that session until the user manually fixes the state file.

## Steps to Reproduce

1. Start a Codex task via `/codex:rescue` (foreground or background)
2. The task process crashes or is killed (e.g., OOM, signal, Codex app-server timeout)
3. Job record in `state.json` remains `status: "running"` with a dead PID
4. All subsequent `task` calls in the same session fail immediately with "Task is still running"

## Root Cause

In `codex-companion.mjs`, `resolveLatestTrackedTaskThread()` checks for active tasks:

```javascript
const activeTask = visibleJobs.find(
  (job) => job.jobClass === "task" && 
  (job.status === "queued" || job.status === "running")
);
if (activeTask) {
  throw new Error(`Task ${activeTask.id} is still running.`);
}
```

This only checks the `status` field in the job record. It does not verify whether the process (stored in `job.pid`) is actually alive.

## Evidence from Production

Observed in plugin version 1.0.3. A session had 6 consecutive failed task attempts, all blocked by the same zombie job:

```
task-mntm7f2j  status=running  pid=82311  (process DEAD)
task-mntmbrox  status=failed   error="Task task-mntm7f2j is still running"
task-mntmbvx4  status=failed   error="Task task-mntm7f2j is still running"
task-mntmclwl  status=failed   error="Task task-mntm7f2j is still running"
task-mntme0qz  status=failed   error="Task task-mntm7f2j is still running"
task-mntmg2an  status=failed   error="Task task-mntm7f2j is still running"
task-mntnft5z  status=failed   error="Task task-mntm7f2j is still running"
```

The PID was confirmed dead via `kill(82311, 0)` → `ProcessLookupError`.

## Suggested Fix

In `resolveLatestTrackedTaskThread()`, before throwing, verify the PID is alive. If the process is dead, mark the job as failed and continue:

```javascript
const activeTask = visibleJobs.find(
  (job) => job.jobClass === "task" && 
  (job.status === "queued" || job.status === "running")
);

if (activeTask) {
  // Check if the process is actually alive
  const pid = activeTask.pid;
  let processAlive = false;
  if (pid) {
    try {
      process.kill(pid, 0);
      processAlive = true;
    } catch (e) {
      processAlive = (e.code === 'EPERM'); // exists but no permission
    }
  }

  if (processAlive) {
    throw new Error(`Task ${activeTask.id} is still running.`);
  }

  // Process is dead — mark as failed (zombie cleanup)
  upsertJob(workspaceRoot, {
    id: activeTask.id,
    status: "failed",
    phase: "failed",
    pid: null,
    errorMessage: "Process exited without updating job status.",
    completedAt: new Date().toISOString()
  });
}
```

## Environment

- Plugin version: 1.0.3
- Codex CLI: `@openai/codex` (latest)
- Claude Code: 2.1.89
- Platform: macOS (Darwin 24.6.0, arm64)
- Node.js: v22.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zombie running job blocks all subsequent task calls when process crashes #202

Summary

Steps to Reproduce

Root Cause

Evidence from Production

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Zombie running job blocks all subsequent task calls when process crashes #202

Description

Summary

Steps to Reproduce

Root Cause

Evidence from Production

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions