Skip to content

feat: add task orchestration and usage insights#86

Open
awsl233777 wants to merge 31 commits intomainfrom
feat/task-orchestration-tab
Open

feat: add task orchestration and usage insights#86
awsl233777 wants to merge 31 commits intomainfrom
feat/task-orchestration-tab

Conversation

@awsl233777
Copy link
Copy Markdown
Collaborator

@awsl233777 awsl233777 commented Apr 12, 2026

Summary

  • add a task orchestration tab with plan preview, queue/workbench views, run details, and tightened in-panel copy
  • tighten usage insights so the model list only reflects stored model names and estimated cost temporarily excludes Claude sessions
  • document task orchestration in README/package metadata and cover the new usage/orchestration rules with unit tests

Validation

  • node tests/unit/run.mjs

Summary by CodeRabbit

Release Notes

  • New Features

    • Added task orchestration system for planning, executing, queuing, and monitoring workflows with dependency management
    • Integrated extended Claude model catalog with BigModel compatibility
    • Added model coverage tracking and cost estimation in session usage analytics
  • Improvements

    • Increased session usage browsing limit to 2,000 records
    • Enhanced session summaries with model and duration metrics
    • Improved session usage visibility with cost estimates and model coverage analysis
  • Documentation

    • Updated README with task orchestration overview and CLI command reference

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

Warning

Rate limit exceeded

@awsl233777 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 1 minutes and 52 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 1 minutes and 52 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3b9e2af2-c805-42e8-9396-619dda4b96d2

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc97ff and 25e2469.

📒 Files selected for processing (38)
  • README.md
  • cli.js
  • lib/cli-models-utils.js
  • lib/task-orchestrator.js
  • package.json
  • tests/e2e/run.js
  • tests/e2e/test-sessions.js
  • tests/e2e/test-setup.js
  • tests/e2e/test-task-orchestration.js
  • tests/unit/claude-settings-sync.test.mjs
  • tests/unit/cli-models-utils.test.mjs
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/provider-share-command.test.mjs
  • tests/unit/run.mjs
  • tests/unit/session-usage-backend.test.mjs
  • tests/unit/session-usage.test.mjs
  • tests/unit/task-orchestrator.test.mjs
  • tests/unit/web-ui-behavior-parity.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/app.js
  • web-ui/index.html
  • web-ui/logic.claude.mjs
  • web-ui/logic.sessions.mjs
  • web-ui/modules/app.computed.dashboard.mjs
  • web-ui/modules/app.computed.index.mjs
  • web-ui/modules/app.computed.main-tabs.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/modules/app.methods.index.mjs
  • web-ui/modules/app.methods.session-browser.mjs
  • web-ui/modules/app.methods.startup-claude.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/layout-header.html
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/partials/index/panel-usage.html
  • web-ui/session-helpers.mjs
  • web-ui/styles.css
  • web-ui/styles/sessions-usage.css
  • web-ui/styles/task-orchestration.css
📝 Walkthrough

Walkthrough

Adds a task-orchestration subsystem: new orchestrator library, CLI/API task commands and persistence, Web UI orchestration panel with state/methods/polling, JSONL session-tail parsing and enhanced usage/costing, model-catalog augmentation, and extensive unit and E2E tests.

Changes

Cohort / File(s) Summary
Orchestrator Core
lib/task-orchestrator.js
New orchestration library: helpers (truncate/split), deterministic wave computation, cycle detection, plan builder (buildTaskPlan), validator (validateTaskPlan), and executor (executeTaskPlan) with concurrency, dependency gating, retries, write-locking, attempt logs and run summaries.
CLI, API & Persistence
cli.js, lib/cli-models-utils.js
Integrates codexmate task CLI and task-* Web API actions; supplemental model catalogs and mergeModelCatalog; JSONL tail/head readers and session summary parsing; local task queue/run persistence (JSON/JSONL), in-memory cancellation controllers and single queue-processor guard.
Web UI — State, Methods, Computed
web-ui/app.js, web-ui/modules/app.methods.task-orchestration.mjs, web-ui/modules/app.methods.index.mjs, web-ui/modules/app.computed.main-tabs.mjs, web-ui/modules/app.computed.index.mjs, web-ui/modules/app.computed.dashboard.mjs, web-ui/session-helpers.mjs
Adds taskOrchestration state & poll timer; UI methods for overview/plan/run/queue/detail/retry/cancel/polling; main-tab computed helpers for draft readiness/checklist, queue/run selectors and stats; tab-switch polling integration.
Web UI — Templates & Styles
web-ui/index.html, web-ui/partials/index/panel-orchestration.html, web-ui/partials/index/layout-header.html, web-ui/styles/task-orchestration.css, web-ui/styles.css, web-ui/styles/sessions-usage.css, web-ui/partials/index/panel-usage.html
Adds orchestration panel and nav tab wiring, status strip, plan/workbench/queue/runs/detail UI and bindings; new orchestration and usage CSS; usage panel adds model-coverage and “全部” time range.
Session Usage, Parsing & Costing
cli.js, web-ui/logic.sessions.mjs, web-ui/modules/app.computed.session.mjs, web-ui/modules/app.methods.session-browser.mjs, web-ui/logic.claude.mjs, lib/cli-models-utils.js
Introduces JSONL tail readers and Codex/Claude summary parsers; captures provider/model/models and token subtotals; increases usage listing limit to 2000; backfills models from summaries or full-file scans; computes active/total durations, usedModels/modelCoverage, and USD cost estimates with provider/model pricing and catalog fallbacks.
Model Catalog Helpers
lib/cli-models-utils.js, web-ui/logic.claude.mjs
Adds fixed supplemental model lists (Anthropic/BigModel), base-URL detectors, getSupplementalModelsForBaseUrl, mergeModelCatalog, and getClaudeModelCatalogForBaseUrl for catalog augmentation.
Tests — E2E
tests/e2e/test-task-orchestration.js, tests/e2e/run.js, tests/e2e/test-sessions.js, tests/e2e/test-setup.js
New E2E coverage exercising plan/run/queue lifecycle across CLI and API, logs, persistence, cancellation, and updated session fixtures; integrates orchestration test into E2E runner.
Tests — Unit & Parity
tests/unit/task-orchestrator.test.mjs, tests/unit/cli-models-utils.test.mjs, tests/unit/*
Unit tests for orchestrator functions and execution; tests covering JSONL tail parsing, session summary parsers, session-usage aggregation/costing, Claude catalog behavior, UI computed/method parity, and test harness wiring.
Session Usage UI & Charts
web-ui/logic.sessions.mjs, web-ui/modules/app.computed.session.mjs, web-ui/partials/index/panel-usage.html
Collects model-coverage and missing-model diagnostics; expands usage chart grouping, adds range: 'all' support, duration aggregates, and UI cards for estimated cost, active and total durations.
Misc UI wiring & Docs
web-ui/modules/*, web-ui/partials/index/*, README.md, package.json
Merged main-tabs computed into app computed; dashboard busy status includes orchestration; README and package metadata updated to document task orchestration and new CLI commands.

Sequence Diagram(s)

sequenceDiagram
    actor User as CLI/User
    participant CLI as CLI Handler
    participant Orch as Task Orchestrator
    participant Exec as Executor
    participant Storage as Task Storage
    participant Queue as Queue Processor

    User->>CLI: codexmate task plan/run/queue...
    CLI->>Orch: buildTaskPlan / validateTaskPlan / executeTaskPlan
    Orch->>Orch: normalize target, build nodes, compute waves
    Orch->>Storage: persist run record / write run detail
    Orch->>Exec: schedule ready nodes (respecting write-lock & concurrency)
    Exec->>Exec: run node (workflow or codex), emit logs
    Exec->>Orch: report node attempt results
    Orch->>Storage: update run detail and final status
    CLI->>Storage: read run detail / list runs / read logs
    User->>CLI: task queue start
    CLI->>Queue: startTaskQueueProcessing (singleton)
    Queue->>Orch: invoke executeTaskPlan for queued item
    Orch->>Storage: update queue item & run records
Loading
sequenceDiagram
    actor User as Web UI User
    participant UI as Orchestration Panel
    participant App as App Methods
    participant API as Web API Server
    participant Orch as Task Orchestrator
    participant Storage as Task Storage

    User->>UI: Open Orchestration tab
    UI->>App: sync polling / load overview
    App->>API: task-overview
    API->>Storage: read queue + recent runs
    API-->>App: overview (workflows, queue, runs)
    App-->>UI: render overview

    User->>UI: Preview Plan
    UI->>App: previewTaskPlan -> API task-plan
    API->>Orch: buildTaskPlan
    Orch-->>API: plan + waves
    API-->>UI: render plan

    User->>UI: Run / Queue task
    UI->>App: runTaskOrchestration / addTaskOrchestrationToQueue
    App->>API: task-run / task-queue-add
    API->>Orch: executeTaskPlan / enqueue
    Orch->>Storage: persist runs, update statuses

    rect rgba(200,150,255,0.5)
    App->>API: poll (task-overview / task-run-detail) while active
    API->>Storage: fetch updated run/queue state
    API-->>App: updated state
    App-->>UI: update UI state
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • ymkiux
  • SurviveM

Poem

🐰 I hopped through nodes and waves with cheer,

I split the target, stitched each plan with care,
I queued and ran, then watched the logs unfurl,
retried a step and nudged a stucked swirl,
a tiny rabbit cheering every task made fair.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'feat: add task orchestration and usage insights' directly reflects the two main features added: task orchestration (new tab, planning, queue, run management) and usage insights (model coverage, estimated costs, duration metrics).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/task-orchestration-tab

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (7)
lib/task-orchestrator.js (1)

712-750: Consider adding a small delay between retry attempts.

The retry loop (lines 715-744) immediately retries failed nodes without any backoff. For transient failures (network issues, rate limits), immediate retries may not help.

💡 Optional: Add exponential backoff between retries
                     for (let attempt = nodeRun.attemptCount + 1; attempt <= maxAttempts; attempt += 1) {
+                        if (attempt > 1) {
+                            const backoffMs = Math.min(1000 * Math.pow(2, attempt - 2), 10000);
+                            await new Promise((resolve) => setTimeout(resolve, backoffMs));
+                        }
                         result = await executeOneNode(nodeRun, nodeDef, {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/task-orchestrator.js` around lines 712 - 750, The retry loop inside the
promise wrapper performing executeOneNode for nodeRun currently retries
immediately; introduce a small delay (preferably exponential backoff) before
continuing when attempt < maxAttempts and the signal isn't aborted.
Specifically, after pushing the "node retry scheduled" log and before the
continue, await a sleep implementation (e.g., Promise + setTimeout) that uses a
base delay multiplied by an exponential factor based on attempt (and capped) so
executeOneNode, nodeRun, maxAttempts and attempt remain unchanged while spacing
out retries to mitigate transient failures.
web-ui/partials/index/panel-orchestration.html (1)

96-105: Consider using a more unique key for v-for loops.

The key issue.code + issue.message could potentially collide if two issues have the same code and message. Similarly for warnings using just warning as the key.

💡 Suggested: Use index-based keys for uniqueness
-                            <div v-if="taskOrchestration.planIssues.length" class="task-issues-list">
-                                <div v-for="issue in taskOrchestration.planIssues" :key="issue.code + issue.message" class="task-issue-item">
+                            <div v-if="taskOrchestration.planIssues.length" class="task-issues-list">
+                                <div v-for="(issue, idx) in taskOrchestration.planIssues" :key="'issue-' + idx" class="task-issue-item">
                                     {{ issue.message }}
                                 </div>
                             </div>
                             <div v-if="taskOrchestration.planWarnings.length" class="task-warning-list">
-                                <div v-for="warning in taskOrchestration.planWarnings" :key="warning" class="task-warning-item">
+                                <div v-for="(warning, idx) in taskOrchestration.planWarnings" :key="'warning-' + idx" class="task-warning-item">
                                     {{ warning }}
                                 </div>
                             </div>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` around lines 96 - 105, The
v-for key values are not guaranteed unique: update the :key bindings in the
loops over taskOrchestration.planIssues and taskOrchestration.planWarnings to
include the loop index or another unique identifier rather than relying solely
on issue.code+issue.message or warning; for example, change the v-for signatures
(v-for="(issue, idx) in taskOrchestration.planIssues" and v-for="(warning, idx)
in taskOrchestration.planWarnings") and set :key to a composite like issue.code
+ '-' + idx or warning + '-' + idx so keys are unique and stable.
web-ui/app.js (1)

350-379: Avoid a second source of truth for orchestration defaults.

Lines 350-379 inline the orchestration state shape, while the orchestration method layer already rebuilds that shape from its own default-state factory. Keeping both copies in sync is brittle and is how new fields go missing from the initial render later. Please move this to one shared factory and reuse it here and in the methods module.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/app.js` around lines 350 - 379, The inline taskOrchestration object is
a duplicate of the defaults produced by the orchestration methods' default-state
factory; remove this second source of truth and reuse the shared factory
instead. Replace the inline taskOrchestration initialization with a call to the
shared factory (e.g., import and call defaultTaskOrchestrationState() or
createTaskOrchestrationDefaults() from the methods module), making sure to clone
the returned object (so state instances aren’t shared) and keep
_taskOrchestrationPollTimer initialized as before. Update any imports
accordingly and ensure selectedRunDetail/other nullable fields match the
factory’s types.
web-ui/modules/app.methods.task-orchestration.mjs (1)

123-125: Use activeRunIds to drive polling instead of inferred status alone.

task-overview already exposes the backend’s live controller set via activeRunIds (cli.js:12113-12126), but this code drops that field and decides whether to keep polling from the last persisted queue/run/detail statuses only. That makes polling easier to stop prematurely when a detached run is still active but not yet reflected in the current lists.

Suggested fix
                 state.workflows = Array.isArray(res && res.workflows) ? res.workflows : [];
                 state.queue = Array.isArray(res && res.queue) ? res.queue : [];
                 state.runs = Array.isArray(res && res.runs) ? res.runs : [];
+                state.activeRunIds = Array.isArray(res && res.activeRunIds) ? res.activeRunIds : [];
@@
         taskOrchestrationHasLiveActivity() {
             const state = this.ensureTaskOrchestrationState();
+            const activeRunIds = Array.isArray(state.activeRunIds) ? state.activeRunIds : [];
+            if (activeRunIds.length > 0) {
+                return true;
+            }
             const queue = Array.isArray(state.queue) ? state.queue : [];
             if (queue.some((item) => isActiveStatus(item && (item.status || item.runStatus)))) {
                 return true;
             }

Also applies to: 358-370

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.methods.task-orchestration.mjs` around lines 123 - 125,
The code currently overwrites state.workflows, state.queue and state.runs from
the response but drops the backend's live controller set; instead preserve and
set state.activeRunIds (e.g., state.activeRunIds = Array.isArray(res &&
res.activeRunIds) ? res.activeRunIds : []) and change any polling/stop logic to
consult state.activeRunIds rather than inferring liveness only from
state.queue/state.runs; apply the same change to the second similar block (the
code around lines 358-370) and ensure task-overview/cli.js interactions use
state.activeRunIds to drive polling.
web-ui/styles/task-orchestration.css (1)

178-196: Scope the pill variants to the orchestration panel.

This file is task-specific, but these selectors are global. Importing this stylesheet into the shared bundle will restyle any existing .pill.success|warn|error|neutral outside this tab as well. Prefix them with a task-specific class or scope them under the orchestration panel root.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/styles/task-orchestration.css` around lines 178 - 196, The
.pill.success/.pill.warn/.pill.error/.pill.neutral selectors are global; scope
them under the orchestration panel root (e.g. prefix each selector with the
panel's root class such as .task-orchestration or .orchestration-panel) so these
rules only apply inside the task orchestration UI; update the selectors that
reference .pill.success, .pill.warn, .pill.error, and .pill.neutral to be
.task-orchestration .pill.success, .task-orchestration .pill.warn,
.task-orchestration .pill.error, and .task-orchestration .pill.neutral (or use
the actual orchestration root class used in the markup) to confine styles to
that component.
tests/e2e/test-task-orchestration.js (2)

81-89: Request JSON explicitly for task queue show.

This assertion currently depends on the current cmdTask special-case that prints JSON for queue show even without --json. Making the format explicit keeps the test aligned with the rest of the task CLI and avoids breakage from harmless output-format changes.

Diff
     const queueShowResult = runSync(node, [
         cliPath,
         'task',
         'queue',
         'show',
+        '--json',
         queueAddPayload.task.taskId
     ], { env });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/test-task-orchestration.js` around lines 81 - 89, The test calls
runSync to execute the CLI 'task queue show' but relies on implicit JSON output;
update the invocation to explicitly request JSON by adding the '--json' flag to
the argument array passed to runSync (where 'runSync' is called with 'cliPath',
'task', 'queue', 'show', queueAddPayload.task.taskId), so
parseJsonOutput(queueShowResult.stdout) receives deterministic JSON output; keep
the assertion on queueShowResult.status and error message unchanged.

3-18: Make the JSON extractor tolerate braces in surrounding output.

The fallback on Lines 11-15 grabs everything from the first { to the last }, so any extra log line containing braces can make this parse the wrong slice or fail entirely. Scanning for the first parseable JSON block would make these E2E assertions much less brittle.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/test-task-orchestration.js` around lines 3 - 18, The fallback
currently uses the first '{' to the last '}', which breaks if surrounding logs
contain braces; update parseJsonOutput to scan for the first parseable JSON
block by iterating over each '{' position in the trimmed text and for each
trying progressively later '}' positions (or matching braces) and calling
JSON.parse until one succeeds, then return that parsed object; retain the
original try/catch for the direct parse and throw the same error message if no
parseable block is found. Reference: parseJsonOutput, start, end variables used
in the existing fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli.js`:
- Around line 11918-11921: coerceTaskPlanPayload currently returns the cloned
plan object directly and thus ignores top-level CLI/API overrides; change it to
clone the plan (as now) then merge any explicit override fields from params
(e.g., allowWrite, dryRun, concurrency, autoFixRounds, and any other top-level
flags parsed by buildTaskCliPayload) into the cloned payload only when those
params are not undefined so CLI/API-provided values take precedence; return the
merged object.
- Around line 12517-12528: The queue runner is currently picking the first
element from listTaskQueueItems which is sorted by updatedAt descending, causing
LIFO behavior; change startTaskQueueProcessing() to pick the oldest queued item
instead—either request listTaskQueueItems with an explicit ascending sort (e.g.,
sort: 'updatedAt:asc') or reverse the returned queue before selecting, or pick
queue[queue.length - 1] as nextItem; update the selection logic where nextItem
is computed (referencing listTaskQueueItems, queue, nextItem, taskId, and
runTaskPlanInternal) so tasks are processed FIFO.
- Around line 12602-12613: The cancel path only looks up g_taskRunControllers by
key (runId), so detached runs started with taskId only won’t be found; after the
existing lookup for controllerByRun and before falling back to
getTaskQueueItem(target), iterate g_taskRunControllers.values() to find a
controller whose taskId === target (or whose runId === target if target may be a
runId string), then call controller.abort() and return the same shape ({ok:true,
cancelled:true, runId:..., taskId:..., mode:'running'}); update the code around
controllerByRun, g_taskRunControllers, and getTaskQueueItem to perform this
secondary lookup so task cancel <taskId> works for detached non-queue runs.
- Around line 12060-12072: The runId is currently used directly in
getTaskRunDetailPath and writeTaskRunDetail which allows path traversal;
validate and sanitize runId before joining with TASK_RUN_DETAILS_DIR (in
getTaskRunDetailPath). For example, trim the input, reject or normalize values
containing path separators or traversal segments (../ or ..\) and only allow a
safe whitelist (e.g. /^[A-Za-z0-9._-]+$/) or map to a safe filename (e.g. using
a stable hash) and return ''/error for invalid runIds so writeTaskRunDetail
never writes outside TASK_RUN_DETAILS_DIR; apply the same validation wherever
runId is accepted.

In `@tests/e2e/test-task-orchestration.js`:
- Around line 165-172: After calling api('task-cancel') for the queued item
(apiCancelQueued.task.taskId), follow up with a query such as
api('task-queue-show', { taskId: apiCancelQueued.task.taskId }) or
api('task-overview') for that task and assert the returned task state is
"canceled" (or the equivalent canceled flag) rather than relying solely on
apiCancel.ok; update the test to fetch the task after cancellation and assert on
its status to ensure the queue item truly transitioned to canceled.

In `@tests/unit/config-tabs-ui.test.mjs`:
- Around line 70-76: The current assertions only check orchestrationPanel (the
source partial) which can miss bundling drift; update the test to also assert
the same handler/state expectations against the bundled HTML output (variable
html) in addition to orchestrationPanel, i.e. run the same assert.match checks
for `@click` handlers, taskOrchestrationSelectedRunNodes, and the selector title
against html (or assert both surfaces) so the test fails if the partial is not
included in the final bundled html; reference the orchestrationPanel and html
variables and reuse the same regexes for both.

In `@web-ui/modules/app.methods.task-orchestration.mjs`:
- Around line 25-27: loadTaskRunDetail can be called concurrently by polling and
manual selection causing stale responses to overwrite selectedRunDetail; modify
loadTaskRunDetail to generate a per-request token (or capture the current
selectedRunId at call time) and only commit selectedRunDetail/selectedRunLoading
when the token/selectedRunId still matches the latest selection, and ensure that
on error you clear selectedRunDetail (or set it to null) instead of leaving the
previous detail; apply the same guard to both polling and manual-selection
callers so responses from older requests are ignored.
- Around line 126-128: On successful task-overview refresh, always
clear/overwrite previous warnings instead of only assigning when res.warnings
has items; update the logic around state.planWarnings in
app.methods.task-orchestration.mjs so that after a successful response you set
state.planWarnings to the response warnings or an empty array (i.e., assign
either res.warnings or [] unconditionally) rather than leaving the old array
when res.warnings is []. This change touches the code that currently checks
Array.isArray(res && res.warnings) && res.warnings.length > 0 and the
state.planWarnings assignment.

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 253: The inline template expression inside the <pre
class="task-log-block"> uses node.logs.map(...) and contains an unescaped '>'
which static analysis flagged; fix by moving the formatting logic out of the
template into a component method or computed property (e.g., create a computed
property named formattedNodeLogs or a method formatNodeLogs that returns the
joined string for node.logs) and replace the inline expression with {{
formattedNodeLogs(node) }} (or call the method) or alternately escape any
literal '>' in the returned string as &gt;; reference the node.logs array
handling and the task-log-block template to locate and replace the expression.

---

Nitpick comments:
In `@lib/task-orchestrator.js`:
- Around line 712-750: The retry loop inside the promise wrapper performing
executeOneNode for nodeRun currently retries immediately; introduce a small
delay (preferably exponential backoff) before continuing when attempt <
maxAttempts and the signal isn't aborted. Specifically, after pushing the "node
retry scheduled" log and before the continue, await a sleep implementation
(e.g., Promise + setTimeout) that uses a base delay multiplied by an exponential
factor based on attempt (and capped) so executeOneNode, nodeRun, maxAttempts and
attempt remain unchanged while spacing out retries to mitigate transient
failures.

In `@tests/e2e/test-task-orchestration.js`:
- Around line 81-89: The test calls runSync to execute the CLI 'task queue show'
but relies on implicit JSON output; update the invocation to explicitly request
JSON by adding the '--json' flag to the argument array passed to runSync (where
'runSync' is called with 'cliPath', 'task', 'queue', 'show',
queueAddPayload.task.taskId), so parseJsonOutput(queueShowResult.stdout)
receives deterministic JSON output; keep the assertion on queueShowResult.status
and error message unchanged.
- Around line 3-18: The fallback currently uses the first '{' to the last '}',
which breaks if surrounding logs contain braces; update parseJsonOutput to scan
for the first parseable JSON block by iterating over each '{' position in the
trimmed text and for each trying progressively later '}' positions (or matching
braces) and calling JSON.parse until one succeeds, then return that parsed
object; retain the original try/catch for the direct parse and throw the same
error message if no parseable block is found. Reference: parseJsonOutput, start,
end variables used in the existing fallback.

In `@web-ui/app.js`:
- Around line 350-379: The inline taskOrchestration object is a duplicate of the
defaults produced by the orchestration methods' default-state factory; remove
this second source of truth and reuse the shared factory instead. Replace the
inline taskOrchestration initialization with a call to the shared factory (e.g.,
import and call defaultTaskOrchestrationState() or
createTaskOrchestrationDefaults() from the methods module), making sure to clone
the returned object (so state instances aren’t shared) and keep
_taskOrchestrationPollTimer initialized as before. Update any imports
accordingly and ensure selectedRunDetail/other nullable fields match the
factory’s types.

In `@web-ui/modules/app.methods.task-orchestration.mjs`:
- Around line 123-125: The code currently overwrites state.workflows,
state.queue and state.runs from the response but drops the backend's live
controller set; instead preserve and set state.activeRunIds (e.g.,
state.activeRunIds = Array.isArray(res && res.activeRunIds) ? res.activeRunIds :
[]) and change any polling/stop logic to consult state.activeRunIds rather than
inferring liveness only from state.queue/state.runs; apply the same change to
the second similar block (the code around lines 358-370) and ensure
task-overview/cli.js interactions use state.activeRunIds to drive polling.

In `@web-ui/partials/index/panel-orchestration.html`:
- Around line 96-105: The v-for key values are not guaranteed unique: update the
:key bindings in the loops over taskOrchestration.planIssues and
taskOrchestration.planWarnings to include the loop index or another unique
identifier rather than relying solely on issue.code+issue.message or warning;
for example, change the v-for signatures (v-for="(issue, idx) in
taskOrchestration.planIssues" and v-for="(warning, idx) in
taskOrchestration.planWarnings") and set :key to a composite like issue.code +
'-' + idx or warning + '-' + idx so keys are unique and stable.

In `@web-ui/styles/task-orchestration.css`:
- Around line 178-196: The .pill.success/.pill.warn/.pill.error/.pill.neutral
selectors are global; scope them under the orchestration panel root (e.g. prefix
each selector with the panel's root class such as .task-orchestration or
.orchestration-panel) so these rules only apply inside the task orchestration
UI; update the selectors that reference .pill.success, .pill.warn, .pill.error,
and .pill.neutral to be .task-orchestration .pill.success, .task-orchestration
.pill.warn, .task-orchestration .pill.error, and .task-orchestration
.pill.neutral (or use the actual orchestration root class used in the markup) to
confine styles to that component.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f2ddeea5-94ac-44f3-b4c6-57d313621386

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc97ff and ea433d6.

📒 Files selected for processing (20)
  • cli.js
  • lib/task-orchestrator.js
  • tests/e2e/run.js
  • tests/e2e/test-task-orchestration.js
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/run.mjs
  • tests/unit/task-orchestrator.test.mjs
  • tests/unit/web-ui-behavior-parity.test.mjs
  • web-ui/app.js
  • web-ui/index.html
  • web-ui/modules/app.computed.dashboard.mjs
  • web-ui/modules/app.computed.index.mjs
  • web-ui/modules/app.computed.main-tabs.mjs
  • web-ui/modules/app.methods.index.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/layout-header.html
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/session-helpers.mjs
  • web-ui/styles.css
  • web-ui/styles/task-orchestration.css
📜 Review details
🧰 Additional context used
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 253-253: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (19)
lib/task-orchestrator.js (7)

1-11: LGTM!

The utility functions isPlainObject and cloneJson are well-implemented. The cloneJson function safely handles circular references by catching JSON errors and returning the fallback.


13-47: LGTM!

The text normalization and truncation utilities are solid. The normalizeId function correctly sanitizes IDs for safe usage, and truncateText properly uses the ellipsis character for truncation indication.


103-149: Wave computation handles cycles gracefully but silently.

When a cycle is detected (line 129: ready.length === 0 but remaining.size > 0), the algorithm dumps all remaining nodes into the final wave without any indication that a cycle was found. This is acceptable for robustness, but consumers should call detectDependencyCycle separately if they need to know about cycles (which validateTaskPlan does).


151-188: LGTM!

The cycle detection uses proper DFS with visiting/visited sets to detect back-edges. The algorithm correctly handles disconnected components by iterating over all graph keys.


505-537: LGTM on execution setup.

The execution initialization properly sets up the run state, node map, and configures concurrency bounds.


667-794: Main execution loop is well-structured with proper concurrency and write-lock handling.

The loop correctly:

  • Checks for abort signal and cancels pending/running nodes
  • Marks blocked nodes when dependencies fail
  • Enforces write-lock exclusivity (write nodes run alone)
  • Uses Promise.race for concurrent node execution
  • Handles the "stalled" edge case when no nodes can be scheduled

832-839: LGTM on module exports.

The module exports the appropriate public API functions while keeping internal helpers private.

tests/unit/run.mjs (1)

17-17: LGTM!

The new test file import follows the established pattern and is placed appropriately in the sequence.

web-ui/styles.css (1)

10-10: LGTM!

The stylesheet import follows the existing pattern and is logically placed within the import order.

web-ui/index.html (1)

20-20: LGTM!

The panel partial include follows the established pattern and is appropriately positioned among the other index panels.

tests/e2e/run.js (2)

26-26: LGTM!

The test module import follows the existing pattern.


138-138: LGTM!

The E2E test execution is appropriately sequenced after workflow tests and before web UI tests, following the natural dependency order.

web-ui/modules/app.computed.index.mjs (2)

2-2: LGTM!

The import follows the established pattern for computed property modules.


11-11: LGTM!

The computed properties are correctly merged following the existing pattern.

web-ui/modules/app.computed.dashboard.mjs (1)

55-57: LGTM!

The busy status integration correctly guards against undefined taskOrchestration and checks all relevant loading states. The properties checked match the state definition in createDefaultTaskOrchestrationState().

web-ui/partials/index/panel-orchestration.html (3)

1-6: LGTM!

The panel structure with v-show, accessibility attributes (role="tabpanel", aria-labelledby), and semantic class naming is well-implemented.

Note: The HTMLHint "doctype-first" warning is a false positive since this is a partial template file, not a complete HTML document.


154-169: LGTM on queue item interaction.

The queue item buttons properly handle click events with @click.stop to prevent event bubbling, and the cancel button is correctly disabled based on status.


207-217: LGTM on run detail actions.

The detail action buttons properly check for selectedRunId, loading states, and use the computed taskOrchestrationSelectedRun for the cancel button's active status check via isTaskRunActive.

web-ui/session-helpers.mjs (1)

143-147: No rejection handling required—loadTaskOrchestrationOverview already absorbs all errors.

The implementation at lines 103–142 in web-ui/modules/app.methods.task-orchestration.mjs wraps the API call in a try-catch-finally block and never rejects. All error paths—early returns, API failures, and exceptions—are caught and returned as resolved response objects (null, res, or { error: message }). No unhandled rejection is possible from this caller's perspective, so the defensive Promise.resolve(...).catch(() => {}) pattern is unnecessary.

			> Likely an incorrect or invalid review comment.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

♻️ Duplicate comments (5)
tests/e2e/test-task-orchestration.js (1)

219-220: ⚠️ Potential issue | 🟡 Minor

Assert the queue item actually becomes canceled.

task-cancel returning { ok: true } only proves the request was accepted. Follow it with task-queue-show or task-overview and assert that the queued task moved to canceled, otherwise this regression path can still pass green.

Suggested assertion
     const apiCancel = await api('task-cancel', { taskId: apiCancelQueued.task.taskId });
     assert(apiCancel.ok === true, 'task-cancel API should cancel queued task');
+    const canceledTask = await api('task-queue-show', { taskId: apiCancelQueued.task.taskId });
+    assert(canceledTask && canceledTask.status === 'canceled', 'task-cancel API should mark queued task as canceled');
 };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/test-task-orchestration.js` around lines 219 - 220, After calling
api('task-cancel', { taskId: apiCancelQueued.task.taskId }) and asserting
apiCancel.ok, add a follow-up query to confirm the queue item state changed:
call the task-queue-show or task-overview endpoint for that taskId (using the
same api helper and apiCancelQueued.task.taskId) and assert the returned
status/state equals 'canceled' (or contains the canceled flag). This ensures the
cancel request effected the queue entry rather than only returning an acceptance
response.
cli.js (4)

11934-11937: ⚠️ Potential issue | 🟠 Major

Preserve explicit CLI/API overrides when plan is supplied.

At Line 11935, returning the cloned plan directly still bypasses top-level overrides (for example --allow-write), so explicit caller intent can be silently ignored.

Suggested fix
 function coerceTaskPlanPayload(params = {}) {
     if (params && params.plan && typeof params.plan === 'object' && !Array.isArray(params.plan)) {
-        return cloneJson(params.plan, {});
+        const plan = cloneJson(params.plan, {});
+        const overrideKeys = [
+            'title', 'target', 'notes', 'engine',
+            'allowWrite', 'dryRun', 'concurrency', 'autoFixRounds',
+            'workflowIds', 'followUps', 'cwd'
+        ];
+        for (const key of overrideKeys) {
+            if (Object.prototype.hasOwnProperty.call(params, key) && params[key] !== undefined) {
+                plan[key] = cloneJson(params[key], params[key]);
+            }
+        }
+        return plan;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 11934 - 11937, coerceTaskPlanPayload currently returns a
cloned params.plan directly which ignores and bypasses top-level CLI/API
overrides (e.g., --allow-write); update coerceTaskPlanPayload so after cloning
params.plan (via cloneJson) you merge any explicit top-level override fields
from params into the cloned result (prefer params' scalar flags like
allowWrite/allowRead/other top-level settings over values in plan), ensuring you
preserve plan content but apply caller-specified overrides; reference the
function coerceTaskPlanPayload, the params.plan object and cloneJson when
locating and implementing the change.

12637-12648: ⚠️ Potential issue | 🟠 Major

task cancel <taskId> still misses detached in-flight runs.

At Line 12637, cancellation only checks g_taskRunControllers.get(target) (runId key lookup). Detached runs cancelled by taskId still fall through.

Suggested fix
-    const controllerByRun = g_taskRunControllers.get(target);
+    const controllerByRun = g_taskRunControllers.get(target)
+        || Array.from(g_taskRunControllers.values()).find((entry) => entry.taskId === target);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12637 - 12648, The cancel flow only looks up
g_taskRunControllers by runId (controllerByRun) and misses detached in-flight
runs referenced by taskId; update the code after the runId check to iterate
g_taskRunControllers.values() to find a controller where controller.taskId ===
target, call controller.abort(), and return a cancelled response (include runId,
taskId and set mode to 'detached' or appropriate) before falling back to
getTaskQueueItem(target).

12424-12484: ⚠️ Potential issue | 🟠 Major

Persist a terminal failure state when orchestration throws.

At Line 12424-Line 12484, executeTaskPlan(...) rejection still skips terminal persistence and can leave run/queue state stuck in running (especially for detached paths that swallow rejections).

Suggested fix
-    try {
+    try {
         const run = await executeTaskPlan(plan, {
           ...
         });
         const detail = {
           ...
         };
         writeTaskRunDetail(detail);
         appendTaskRunRecord(collectTaskRunSummary(detail));
         ...
         return detail;
+    } catch (error) {
+        const failedRun = {
+            status: 'failed',
+            startedAt: toIsoTime(Date.now()),
+            endedAt: toIsoTime(Date.now()),
+            durationMs: 0,
+            nodes: [],
+            logs: [],
+            summary: '',
+            error: error && error.message ? error.message : String(error || 'task execution failed')
+        };
+        const detail = {
+            ...baseDetail,
+            updatedAt: toIsoTime(Date.now()),
+            status: 'failed',
+            run: failedRun
+        };
+        writeTaskRunDetail(detail);
+        appendTaskRunRecord(collectTaskRunSummary(detail));
+        if (options.queueItem) {
+            upsertTaskQueueItem({
+                ...options.queueItem,
+                taskId,
+                status: 'failed',
+                runStatus: 'failed',
+                lastRunId: runId,
+                lastSummary: failedRun.error,
+                updatedAt: toIsoTime(Date.now()),
+                plan
+            });
+        }
+        return detail;
     } finally {
         g_taskRunControllers.delete(runId);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12424 - 12484, executeTaskPlan(...) can throw and bypass
the terminal persistence logic; wrap the await executeTaskPlan call in a
try/catch (inside the existing try/finally) and in the catch create a terminal
failure detail (use baseDetail, toIsoTime(Date.now()), status: 'failed', run: {
status: 'failed', error: error.message || String(error) }) then call
writeTaskRunDetail(detail), appendTaskRunRecord(collectTaskRunSummary(detail)),
and upsertTaskQueueItem(...) with the same fields used after successful runs
(status 'failed', runStatus '', lastSummary from error) so queue/run state is
persisted before the finally block that calls
g_taskRunControllers.delete(runId); reference executeTaskPlan, controller,
writeTaskRunDetail, appendTaskRunRecord, upsertTaskQueueItem,
collectTaskRunSummary, g_taskRunControllers.

12552-12555: ⚠️ Potential issue | 🟠 Major

Queue runner is still LIFO instead of FIFO.

At Line 12552-Line 12555, queued items are sorted newest-first and queue[0] is selected, so newer tasks jump ahead of older ones.

Suggested fix
-            const nextItem = taskId
-                ? queue.find((item) => item.taskId === taskId)
-                : queue[0];
+            const nextItem = taskId
+                ? queue.find((item) => item.taskId === taskId)
+                : queue[queue.length - 1];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12552 - 12555, The code picks the newest queued item
(LIFO) because it uses queue[0]; change the selection logic in the block that
defines queue and nextItem (where listTaskQueueItems is called) to choose the
oldest item for FIFO semantics: when taskId is provided keep the find((item) =>
item.taskId === taskId), otherwise pick the last element (e.g.,
queue[queue.length - 1] or queue.at(-1)), or invert the array order
(queue.reverse()[0]) so that nextItem is the oldest queued task.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli.js`:
- Around line 9472-9474: Detached task runs always call createTaskRunId() and
thus ignore a caller-provided runId; change the runId assignment to respect
params.runId when present. Replace the hardcoded createTaskRunId() usage with a
guarded value like: const runId = typeof params.runId === 'string' &&
params.runId.trim() ? params.runId.trim() : createTaskRunId(); and pass that
runId into runTaskPlanInternal(plan, { taskId, runId }). Apply the same change
to the other detached-run site that currently calls createTaskRunId() so both
occurrences use the provided params.runId if supplied.

In `@lib/task-orchestrator.js`:
- Around line 419-424: There are two identical validation blocks that check if
nodes.length === 0 and push the same issue into issues, causing duplicate error
messages; remove the redundant if-statement (the duplicate "if (nodes.length ===
0) { issues.push({ code: 'task-plan-nodes-required', message: 'task plan must
contain at least one node' }); }") so only a single check remains—locate the
duplicate by searching for the "nodes" variable and the push to "issues" with
code 'task-plan-nodes-required' and delete the extra block.

In `@web-ui/modules/app.methods.task-orchestration.mjs`:
- Around line 253-260: The current handler for the api('task-queue-start')
response always shows "队列执行器已启动" even when nothing was started; update the logic
after awaiting api('task-queue-start') in the function that calls
loadTaskOrchestrationOverview and syncTaskOrchestrationPolling to branch on
res.started and res.alreadyRunning: if res.error keep existing behavior, else if
res.started show the success toast '队列执行器已启动', else if res.alreadyRunning show a
different message like '队列执行器已在运行' (or equivalent) so the UI reflects whether
the queue was newly started or already running; keep calls to
loadTaskOrchestrationOverview and syncTaskOrchestrationPolling unchanged.
- Around line 125-128: The task-overview response is overwriting plan-specific
warnings into state.planWarnings; create and use a separate state property
(e.g., state.overviewWarnings) in the update code that handles the task-overview
response so overview/catalog warnings are stored there instead of
state.planWarnings, and ensure previewTaskPlan() remains the sole owner that
writes state.planWarnings; update the block that assigns
state.workflows/queue/runs/warnings to assign res.warnings to
state.overviewWarnings (or leave planWarnings untouched) and update any template
references to show overview warnings from state.overviewWarnings.
- Around line 179-184: The catch block currently returns an error but leaves
stale preview state (this.plan, this.planIssues, this.planWarnings), so update
the catch path in the task orchestration method to clear the preview state
before returning: set this.plan to null or an empty object and set
this.planIssues and this.planWarnings to empty arrays (or appropriate empty
defaults), then proceed to show the error via this.showMessage (honoring
options.silent) and return the error object; reference this.plan,
this.planIssues, this.planWarnings, options.silent, and this.showMessage when
making the change.
- Around line 67-74: ensureTaskOrchestrationState currently returns the existing
taskOrchestration object as-is which can lack keys present in
createDefaultTaskOrchestrationState (e.g., selectedRunError,
detailRequestToken); instead, merge defaults into the existing object before
returning: call createDefaultTaskOrchestrationState() and assign any missing
keys onto this.taskOrchestration (or use Object.assign(this.taskOrchestration,
defaults)) within ensureTaskOrchestrationState so the returned object always
conforms to the default schema; update references to
ensureTaskOrchestrationState, taskOrchestration, and
createDefaultTaskOrchestrationState when applying the change.

---

Duplicate comments:
In `@cli.js`:
- Around line 11934-11937: coerceTaskPlanPayload currently returns a cloned
params.plan directly which ignores and bypasses top-level CLI/API overrides
(e.g., --allow-write); update coerceTaskPlanPayload so after cloning params.plan
(via cloneJson) you merge any explicit top-level override fields from params
into the cloned result (prefer params' scalar flags like
allowWrite/allowRead/other top-level settings over values in plan), ensuring you
preserve plan content but apply caller-specified overrides; reference the
function coerceTaskPlanPayload, the params.plan object and cloneJson when
locating and implementing the change.
- Around line 12637-12648: The cancel flow only looks up g_taskRunControllers by
runId (controllerByRun) and misses detached in-flight runs referenced by taskId;
update the code after the runId check to iterate g_taskRunControllers.values()
to find a controller where controller.taskId === target, call
controller.abort(), and return a cancelled response (include runId, taskId and
set mode to 'detached' or appropriate) before falling back to
getTaskQueueItem(target).
- Around line 12424-12484: executeTaskPlan(...) can throw and bypass the
terminal persistence logic; wrap the await executeTaskPlan call in a try/catch
(inside the existing try/finally) and in the catch create a terminal failure
detail (use baseDetail, toIsoTime(Date.now()), status: 'failed', run: { status:
'failed', error: error.message || String(error) }) then call
writeTaskRunDetail(detail), appendTaskRunRecord(collectTaskRunSummary(detail)),
and upsertTaskQueueItem(...) with the same fields used after successful runs
(status 'failed', runStatus '', lastSummary from error) so queue/run state is
persisted before the finally block that calls
g_taskRunControllers.delete(runId); reference executeTaskPlan, controller,
writeTaskRunDetail, appendTaskRunRecord, upsertTaskQueueItem,
collectTaskRunSummary, g_taskRunControllers.
- Around line 12552-12555: The code picks the newest queued item (LIFO) because
it uses queue[0]; change the selection logic in the block that defines queue and
nextItem (where listTaskQueueItems is called) to choose the oldest item for FIFO
semantics: when taskId is provided keep the find((item) => item.taskId ===
taskId), otherwise pick the last element (e.g., queue[queue.length - 1] or
queue.at(-1)), or invert the array order (queue.reverse()[0]) so that nextItem
is the oldest queued task.

In `@tests/e2e/test-task-orchestration.js`:
- Around line 219-220: After calling api('task-cancel', { taskId:
apiCancelQueued.task.taskId }) and asserting apiCancel.ok, add a follow-up query
to confirm the queue item state changed: call the task-queue-show or
task-overview endpoint for that taskId (using the same api helper and
apiCancelQueued.task.taskId) and assert the returned status/state equals
'canceled' (or contains the canceled flag). This ensures the cancel request
effected the queue entry rather than only returning an acceptance response.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 087b8345-f749-492e-822a-0750b871752e

📥 Commits

Reviewing files that changed from the base of the PR and between ea433d6 and e8329d6.

📒 Files selected for processing (7)
  • cli.js
  • lib/task-orchestrator.js
  • tests/e2e/test-task-orchestration.js
  • tests/unit/task-orchestrator.test.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (1)
  • web-ui/styles/task-orchestration.css
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/task-orchestrator.test.mjs
📜 Review details
🧰 Additional context used
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 259-259: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (12)
web-ui/partials/index/panel-orchestration.html (1)

259-259: Move node log formatting out of the template.

This inline formatter is still carrying the HTMLHint escape complaint and does unnecessary work on every render. Reuse a helper such as formatNodeLogs(node.logs) here instead.

lib/task-orchestrator.js (11)

1-61: LGTM!

The utility functions are well-implemented with proper defensive checks. Good use of normalization and truncation to bound inputs.


63-101: LGTM!

The target splitting logic handles both newline-delimited and semicolon-delimited input gracefully, with sensible limits on item count and length.


103-149: LGTM!

Correct implementation of Kahn's algorithm for wave-based topological sorting. The graceful handling of cycles (lines 129-131) is appropriate since validateTaskPlan performs explicit cycle detection separately.


151-188: LGTM!

Standard DFS-based cycle detection with proper handling of visiting vs. visited states. Correctly skips edges to unknown nodes.


190-243: LGTM!

Well-structured prompt builder with clear separation of concerns for different node kinds. The write rule injection correctly enforces access constraints.


245-397: LGTM!

The plan builder correctly constructs both workflow and codex execution graphs. The dependency structure (analysis → parallel work → verify → sequential follow-ups) is well-designed for the orchestration use case.


484-515: LGTM!

Clean factory functions for run records and logs with appropriate defaults and normalization.


517-549: LGTM!

Good defensive initialization with sensible defaults. The emitUpdate helper properly clones the run state before notifying callbacks.


710-723: LGTM!

The write lock logic correctly implements mutual exclusion: write nodes run exclusively (lines 714-716 ensure no concurrent nodes), and non-write nodes wait for write completion (lines 717-719). This is appropriate for safe filesystem operations.


678-811: LGTM!

The execution loop correctly handles:

  • Cancellation via AbortSignal with proper cleanup of pending/running nodes
  • Promise.race for concurrent node completion
  • Write lock release on node completion
  • Stall detection when no nodes can be scheduled

813-851: LGTM!

Final status determination correctly prioritizes failed > cancelled > success with proper fallbacks. The exported API surface is appropriately minimal.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (5)
cli.js (5)

12804-12807: ⚠️ Potential issue | 🟠 Major

Queue runner is still LIFO, not FIFO.

The runner picks queue[0] while queued items are sorted by latest updatedAt, so newest queued task runs first.

💡 Suggested fix
 const nextItem = taskId
     ? queue.find((item) => item.taskId === taskId)
-    : queue[0];
+    : queue[queue.length - 1];
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12804 - 12807, The runner currently selects queue[0]
from the list returned by listTaskQueueItems, but listTaskQueueItems returns
items sorted by latest updatedAt first so this is LIFO; change the selection
logic in the block that computes nextItem so it picks the oldest queued item
(FIFO) — either pick the last element of queue when taskId is falsy (e.g.
queue[queue.length - 1]) or explicitly sort queue by updatedAt ascending and
then take the first element; update references to queue, nextItem and preserve
the taskId override behavior.

12186-12189: ⚠️ Potential issue | 🟠 Major

Preserve explicit CLI/API overrides when --plan/plan object is provided.

This branch returns the cloned plan directly, so explicit top-level flags from CLI/API are dropped when a plan object exists.

💡 Suggested fix
 function coerceTaskPlanPayload(params = {}) {
     if (params && params.plan && typeof params.plan === 'object' && !Array.isArray(params.plan)) {
-        return cloneJson(params.plan, {});
+        const plan = cloneJson(params.plan, {});
+        const overrides = normalizeTaskPlanRequest(params || {});
+        const keys = ['title', 'target', 'notes', 'cwd', 'engine', 'allowWrite', 'dryRun', 'concurrency', 'autoFixRounds', 'workflowIds', 'followUps'];
+        for (const key of keys) {
+            if (Object.prototype.hasOwnProperty.call(params, key) && params[key] !== undefined) {
+                plan[key] = overrides[key];
+            }
+        }
+        return plan;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12186 - 12189, The coerceTaskPlanPayload function
currently returns cloneJson(params.plan) and thus drops any explicit top-level
CLI/API flags on params; change it to produce a new payload by cloning
params.plan (via cloneJson) then shallow-merge top-level properties from params
(excluding plan) into that cloned object so explicit overrides on params take
precedence, ensuring you do not mutate the original params or plan; update
references to params.plan, cloneJson, and coerceTaskPlanPayload accordingly.

9724-9726: ⚠️ Potential issue | 🟡 Minor

Detached run path still ignores caller-provided runId.

Both detached entry points always generate a new run ID, so explicit --run-id / API runId is ignored.

💡 Suggested fix
-const runId = createTaskRunId();
+const requestedRunId = typeof params.runId === 'string' ? params.runId.trim() : '';
+const runIdValidation = requestedRunId
+    ? validateTaskRunId(requestedRunId)
+    : { ok: true, runId: createTaskRunId(), error: '' };
+if (!runIdValidation.ok) {
+    result = { ok: false, error: runIdValidation.error };
+    break;
+}
+const runId = runIdValidation.runId;
-const runId = createTaskRunId();
+const requestedRunId = typeof options.runId === 'string' ? options.runId.trim() : '';
+const runIdValidation = requestedRunId
+    ? validateTaskRunId(requestedRunId)
+    : { ok: true, runId: createTaskRunId(), error: '' };
+if (!runIdValidation.ok) {
+    throw new Error(runIdValidation.error);
+}
+const runId = runIdValidation.runId;

Also applies to: 10950-10953

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 9724 - 9726, The detached run path always calls
createTaskRunId() and ignores a caller-provided runId; change the runId
assignment to prefer a trimmed params.runId when provided (e.g., check typeof
params.runId === 'string' && params.runId.trim() ? params.runId.trim() :
createTaskRunId()) and use that runId when calling runTaskPlanInternal(plan, {
taskId, runId }). Apply the same fix to the other detached entry point
referenced (the block around lines 10950-10953) so both paths honor an explicit
runId.

12889-12900: ⚠️ Potential issue | 🟠 Major

task cancel <taskId> still misses detached running tasks.

Cancellation checks active controllers by map key only (runId). Detached runs cancelled by taskId still fall through as “not found”.

💡 Suggested fix
-const controllerByRun = g_taskRunControllers.get(target);
+const controllerByRun = g_taskRunControllers.get(target)
+    || Array.from(g_taskRunControllers.values()).find((entry) => entry.taskId === target);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12889 - 12900, The cancellation logic only looks up
controllers by runId via g_taskRunControllers.get(target), so detached runs that
were started by taskId aren't found; change the flow in the cancel command
handler (where controllerByRun is computed) to, when controllerByRun is falsy,
iterate g_taskRunControllers.values() to find a controller whose taskId ===
target (or whose taskId matches parsed target) and is not already aborted, then
call controller.abort() and return the same cancel response (ok:true,
cancelled:true, runId: controller.runId, taskId: controller.taskId,
mode:'running'); keep the existing getTaskQueueItem fallback for queued tasks.
Ensure you reference controllerByRun, g_taskRunControllers, getTaskQueueItem,
runId, and taskId when making the change.

12576-12737: ⚠️ Potential issue | 🟠 Major

Persist terminal failure state when task execution throws.

runTaskPlanInternal only has try/finally; if executeTaskPlan(...) rejects, run detail/queue status can remain running forever.

💡 Suggested fix
-    try {
+    try {
         const run = await executeTaskPlan(plan, {
           ...
         });
         ...
         return detail;
+    } catch (error) {
+        const message = error && error.message ? error.message : String(error || 'task run failed');
+        const failedRun = {
+            status: 'failed',
+            startedAt: toIsoTime(Date.now()),
+            endedAt: toIsoTime(Date.now()),
+            durationMs: 0,
+            nodes: [],
+            logs: [{ at: toIsoTime(Date.now()), level: 'error', message }],
+            summary: message,
+            error: message
+        };
+        const failedDetail = {
+            ...baseDetail,
+            updatedAt: toIsoTime(Date.now()),
+            status: 'failed',
+            run: failedRun
+        };
+        writeTaskRunDetail(failedDetail);
+        appendTaskRunRecord(collectTaskRunSummary(failedDetail));
+        if (options.queueItem) {
+            upsertTaskQueueItem({
+                ...options.queueItem,
+                taskId,
+                status: 'failed',
+                runStatus: 'failed',
+                lastRunId: runId,
+                lastSummary: message,
+                updatedAt: toIsoTime(Date.now()),
+                plan
+            });
+        }
+        return failedDetail;
     } finally {
         g_taskRunControllers.delete(runId);
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12576 - 12737, runTaskPlanInternal currently uses
try/finally around await executeTaskPlan(...) so if executeTaskPlan rejects the
run stays marked as "running"; wrap the executeTaskPlan call in a try/catch
(inside the existing try/finally) to catch errors, build a failure detail (using
baseDetail, run.status='failed', run.error/message, updatedAt, summary via
collectTaskRunSummary if appropriate), call writeTaskRunDetail(detail),
appendTaskRunRecord(collectTaskRunSummary(detail)), and if options.queueItem
call upsertTaskQueueItem(...) to mark the queue item as
failed/lastSummary/updatedAt, then rethrow the error; reference
runTaskPlanInternal, executeTaskPlan, writeTaskRunDetail, appendTaskRunRecord,
collectTaskRunSummary, upsertTaskQueueItem and g_taskRunControllers to locate
where to insert the catch and update persistence before finally block cleans up
controllers.
🧹 Nitpick comments (1)
web-ui/modules/app.computed.session.mjs (1)

109-157: Reasoning output tokens charged at regular output rate — verify this is intentional.

Line 138 applies the same pricing.output rate to both outputTokens and reasoningOutputTokens. Some providers charge reasoning tokens at a different rate. If this simplification is intentional for now, consider adding a comment.

💡 Optional: Support separate reasoning token pricing

If you want to support distinct reasoning pricing in the future:

+            + ((pricing.reasoningOutput ?? pricing.output ?? 0) * reasoningOutputTokens)
-            + ((pricing.output || 0) * ((outputTokens || 0) + reasoningOutputTokens))
+            + ((pricing.output || 0) * (outputTokens || 0))

This would require adding reasoningOutput to the pricing schema in buildUsagePricingIndex.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 109 - 157,
estimateUsageCostSummary currently charges reasoningOutputTokens at
pricing.output; update the calculation to use a distinct reasoning rate when
provided: check for pricing.reasoningOutput (or similar) and apply
((pricing.reasoningOutput || pricing.output) * reasoningOutputTokens) instead of
using pricing.output for reasoningOutputTokens, and update
buildUsagePricingIndex to include the reasoningOutput price in the pricingIndex
schema so resolveUsagePricingForSession can return it; if this is an intentional
simplification, add an inline comment near estimateUsageCostSummary (and/or
buildUsagePricingIndex) documenting that reasoning tokens are charged at the
regular output rate for now.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli.js`:
- Around line 12791-12848: startTaskQueueProcessing allows concurrent
non-detached runs to race and process the same queued task; to fix, serialize
foreground runs the same way detached runs are guarded: before invoking runner
when detach is false, check g_taskQueueProcessor and return { ok:true,
started:false, alreadyRunning:true } if set; otherwise assign
g_taskQueueProcessor = runner().catch(() => null).finally(() => {
g_taskQueueProcessor = null; }) and await that promise to get detail, then
return the existing success object (ok/started/detached/detail). Ensure you use
the same g_taskQueueProcessor, runner, detach, and detail variables so both
detached and non-detached paths are mutually exclusive.

In `@tests/unit/session-usage-backend.test.mjs`:
- Around line 242-245: The toIsoTime stub currently calls new
Date(value).toISOString() which throws on invalid dates; change to first create
const d = new Date(value) and check if Number.isNaN(d.getTime()) (or
d.toString() === 'Invalid Date'), returning the fallback when invalid, otherwise
return d.toISOString(); update the stubbed function named toIsoTime accordingly
to avoid throwing RangeError on bad input.

---

Duplicate comments:
In `@cli.js`:
- Around line 12804-12807: The runner currently selects queue[0] from the list
returned by listTaskQueueItems, but listTaskQueueItems returns items sorted by
latest updatedAt first so this is LIFO; change the selection logic in the block
that computes nextItem so it picks the oldest queued item (FIFO) — either pick
the last element of queue when taskId is falsy (e.g. queue[queue.length - 1]) or
explicitly sort queue by updatedAt ascending and then take the first element;
update references to queue, nextItem and preserve the taskId override behavior.
- Around line 12186-12189: The coerceTaskPlanPayload function currently returns
cloneJson(params.plan) and thus drops any explicit top-level CLI/API flags on
params; change it to produce a new payload by cloning params.plan (via
cloneJson) then shallow-merge top-level properties from params (excluding plan)
into that cloned object so explicit overrides on params take precedence,
ensuring you do not mutate the original params or plan; update references to
params.plan, cloneJson, and coerceTaskPlanPayload accordingly.
- Around line 9724-9726: The detached run path always calls createTaskRunId()
and ignores a caller-provided runId; change the runId assignment to prefer a
trimmed params.runId when provided (e.g., check typeof params.runId === 'string'
&& params.runId.trim() ? params.runId.trim() : createTaskRunId()) and use that
runId when calling runTaskPlanInternal(plan, { taskId, runId }). Apply the same
fix to the other detached entry point referenced (the block around lines
10950-10953) so both paths honor an explicit runId.
- Around line 12889-12900: The cancellation logic only looks up controllers by
runId via g_taskRunControllers.get(target), so detached runs that were started
by taskId aren't found; change the flow in the cancel command handler (where
controllerByRun is computed) to, when controllerByRun is falsy, iterate
g_taskRunControllers.values() to find a controller whose taskId === target (or
whose taskId matches parsed target) and is not already aborted, then call
controller.abort() and return the same cancel response (ok:true, cancelled:true,
runId: controller.runId, taskId: controller.taskId, mode:'running'); keep the
existing getTaskQueueItem fallback for queued tasks. Ensure you reference
controllerByRun, g_taskRunControllers, getTaskQueueItem, runId, and taskId when
making the change.
- Around line 12576-12737: runTaskPlanInternal currently uses try/finally around
await executeTaskPlan(...) so if executeTaskPlan rejects the run stays marked as
"running"; wrap the executeTaskPlan call in a try/catch (inside the existing
try/finally) to catch errors, build a failure detail (using baseDetail,
run.status='failed', run.error/message, updatedAt, summary via
collectTaskRunSummary if appropriate), call writeTaskRunDetail(detail),
appendTaskRunRecord(collectTaskRunSummary(detail)), and if options.queueItem
call upsertTaskQueueItem(...) to mark the queue item as
failed/lastSummary/updatedAt, then rethrow the error; reference
runTaskPlanInternal, executeTaskPlan, writeTaskRunDetail, appendTaskRunRecord,
collectTaskRunSummary, upsertTaskQueueItem and g_taskRunControllers to locate
where to insert the catch and update persistence before finally block cleans up
controllers.

---

Nitpick comments:
In `@web-ui/modules/app.computed.session.mjs`:
- Around line 109-157: estimateUsageCostSummary currently charges
reasoningOutputTokens at pricing.output; update the calculation to use a
distinct reasoning rate when provided: check for pricing.reasoningOutput (or
similar) and apply ((pricing.reasoningOutput || pricing.output) *
reasoningOutputTokens) instead of using pricing.output for
reasoningOutputTokens, and update buildUsagePricingIndex to include the
reasoningOutput price in the pricingIndex schema so
resolveUsagePricingForSession can return it; if this is an intentional
simplification, add an inline comment near estimateUsageCostSummary (and/or
buildUsagePricingIndex) documenting that reasoning tokens are charged at the
regular output rate for now.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3a55887a-33c9-44d9-ad37-fc9d246b7bd9

📥 Commits

Reviewing files that changed from the base of the PR and between e8329d6 and 28673cc.

📒 Files selected for processing (5)
  • cli.js
  • tests/unit/session-usage-backend.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/modules/app.methods.session-browser.mjs
✅ Files skipped from review due to trivial changes (1)
  • web-ui/modules/app.methods.session-browser.mjs
📜 Review details
🔇 Additional comments (10)
tests/unit/web-ui-logic.test.mjs (2)

711-728: LGTM!

The updated test correctly adds the required inputs (sessionsUsageList, providersList, currentProvider) and validates that the estimated-cost card is present with the expected "暂无" fallback when no pricing data is available.


730-789: LGTM!

The cost estimation test thoroughly validates the pricing logic:

  • Correctly calculates $1.25 from the first session (billable input 200K × $2/M + cached 100K × $0.5/M + output 100K × $8/M)
  • Properly excludes the second session with unknown model from the estimate
  • Validates coverage text shows 1/2 sessions and ~67% token coverage
tests/unit/session-usage-backend.test.mjs (3)

33-49: LGTM!

The new function extractions follow the established pattern for pulling helpers from cli.js to test JSONL parsing and usage summary logic.


143-143: LGTM!

The MAX_SESSION_USAGE_LIST_SIZE increase from 300 to 2000 aligns with the AI summary indicating expanded session usage aggregation limits. The test assertions correctly reflect the new expected limit values.

Also applies to: 185-185, 202-207


213-335: Well-structured integration test for parseCodexSessionSummary.

The test properly:

  • Creates a temporary JSONL file with realistic Codex session records
  • Validates provider/model extraction from session_meta and turn_context records
  • Verifies token totals from the token_count event payload
  • Cleans up with fs.rmSync in a finally block

Note the toIsoTime stub issue flagged separately.

web-ui/modules/app.computed.session.mjs (5)

28-34: LGTM!

Clean validation function that correctly handles non-finite numbers and negative values by returning null.


36-50: LGTM!

Well-structured cost formatting with appropriate threshold handling: <$0.01 for tiny amounts, and reduced precision for values ≥$100 to avoid unnecessary decimals.


52-86: LGTM!

The pricing index construction correctly:

  • Creates a two-level lookup (by provider, then by model)
  • Maintains a reverse index for model-only fallback
  • Filters out models without any valid pricing rates

88-107: LGTM!

The pricing resolution follows a sensible priority: provider-specific pricing first, then model-only fallback (only when exactly one provider offers that model). This avoids ambiguity when multiple providers have different rates for the same model.


294-321: LGTM!

The integration into sessionUsageSummaryCards is clean:

  • Computes estimated cost from session usage list and provider pricing
  • Shows formatted USD when estimates are available, "暂无" otherwise
  • Provides informative tooltip showing coverage statistics

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (7)
web-ui/partials/index/panel-orchestration.html (1)

259-259: ⚠️ Potential issue | 🟡 Minor

Move log formatting out of template to resolve escaping/lint noise.

Line 259 still uses a complex inline expression in <pre>, which is the same area previously flagged by static analysis (spec-char-escape). Extract this into a formatter method/computed and bind only the final string in the template.

🔧 Suggested change
-                                    <pre class="task-log-block">{{ Array.isArray(node.logs) && node.logs.length ? node.logs.map((item) => `${item.at || ''} ${item.level || ''} ${item.message || ''}`.trim()).join('\n') : '(no logs)' }}</pre>
+                                    <pre class="task-log-block">{{ formatTaskNodeLogs(node.logs) }}</pre>
// Add in task orchestration methods module
formatTaskNodeLogs(logs) {
    if (!Array.isArray(logs) || logs.length === 0) return '(no logs)';
    return logs
        .map((item) => `${item.at || ''} ${item.level || ''} ${item.message || ''}`.trim())
        .join('\n');
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` at line 259, Replace the
inline complex expression in the <pre> (which maps node.logs) with a simple
binding to a formatter method; add a method (e.g., formatTaskNodeLogs(logs)) in
the component's methods or computed that returns '(no logs)' for non-array/empty
input or maps each log to `${item.at || ''} ${item.level || ''} ${item.message
|| ''}`.trim() and joins with '\n', then update the template to use {{
formatTaskNodeLogs(node.logs) }} so all formatting is done outside the template
and the template only binds the final string.
cli.js (6)

9693-9708: ⚠️ Potential issue | 🟡 Minor

Honor caller-supplied runId in detached starts.

Both detached paths always generate a new ID, so --run-id / API runId cannot be used for correlation or later lookup when the run is started in detached mode.

Also applies to: 10926-10935

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 9693 - 9708, Detached starts always generate a new
runId, ignoring a caller-supplied runId; change the runId assignment so it uses
the provided params.runId when present and non-empty instead of always calling
createTaskRunId(). Locate the detached branch around the variables detach,
params, coerceTaskPlanPayload, validatePreparedTaskPlan, taskId (createTaskId)
and runId (createTaskRunId) and replace the runId initialization with logic that
uses params.runId.trim() when typeof params.runId === 'string' and non-empty,
otherwise falls back to createTaskRunId(), then pass that runId into
runTaskPlanInternal as before.

12187-12190: ⚠️ Potential issue | 🟠 Major

Preserve explicit overrides when a plan object is supplied.

Returning the cloned plan as-is drops top-level CLI/API flags like allowWrite, dryRun, concurrency, and autoFixRounds, so task run --plan @plan.json --allow-write silently ignores the override. Merge only explicitly provided params on top of the cloned plan before returning it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12187 - 12190, In coerceTaskPlanPayload, when
params.plan is an object the function currently returns a clone of the plan and
thus ignores top-level CLI/API overrides; change it to clone the plan (using
cloneJson) and then shallow-merge any explicitly provided top-level flags from
params (e.g. allowWrite, dryRun, concurrency, autoFixRounds and other top-level
CLI flags) onto the cloned plan before returning so that explicit overrides in
params take precedence over values in the plan.

12617-12624: ⚠️ Potential issue | 🟠 Major

Persist a terminal state on every non-success exit.

The early validation return leaves options.queueItem untouched, so a queued plan that becomes invalid later stays queued and can be selected forever. And inside the main execution path there is only try/finally; if executeTaskPlan() or onUpdate throws, the controller is removed but the persisted detail/queue state stays running.

Also applies to: 12676-12737

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12617 - 12624, The early return after
validatePreparedTaskPlan(... ) and uncaught exceptions in executeTaskPlan(...) /
onUpdate leave options.queueItem stuck as non-terminal; change all early returns
and all catch paths to persist a terminal state for options.queueItem (e.g. mark
as failed/invalid) before returning, by calling the same persistence/update
routine used elsewhere (e.g. onUpdate or the queue persistence helper) and
include the validation.error/details; similarly, wrap executeTaskPlan/onUpdate
calls with try/catch that sets options.queueItem to a terminal status and
persists it on any non-success exit, while keeping the existing finally that
removes the controller so the item cannot remain queued or running indefinitely.

12890-12945: ⚠️ Potential issue | 🟠 Major

task cancel only works against the current process's runId map.

This lookup only hits g_taskRunControllers by key, so detached non-queue runs are not cancellable by the advertised taskId. More broadly, a fresh CLI invocation starts with an empty controller map, so it cannot cancel runs started by an earlier CLI command.


12257-12267: ⚠️ Potential issue | 🟠 Major

Claim and update queue items atomically.

upsertTaskQueueItem() is a plain read-modify-write of TASK_QUEUE_FILE, and the runner selects nextItem before any durable claim is written. A second codexmate process can overwrite queue state or start the same queued task too; the in-memory guard only serializes one process. Consider a file lock/lease or a transactional store here.

Also applies to: 12802-12816

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12257 - 12267, upsertTaskQueueItem performs an unsafe
read-modify-write on TASK_QUEUE_FILE (via readTaskQueueState,
normalizeTaskQueueItem, writeTaskQueueState) allowing races across processes;
change it to acquire an exclusive file lock/lease around the operation (or use
an atomic transactional backend) so the sequence is: acquire lock on
TASK_QUEUE_FILE, re-read state, apply normalizeTaskQueueItem and upsert into
state.tasks, write the updated state atomically (write temp + rename or use
atomic write APIs), then release the lock; also ensure the runner that picks
nextItem persists a durable claim under the same lock/transaction before
starting the task.

12276-12295: ⚠️ Potential issue | 🟠 Major

The queue runner is still newest-first.

listTaskQueueItems({ status: 'queued' }) sorts by updatedAt descending, and the runner picks queue[0], so newer or retried tasks jump ahead of older queued work instead of FIFO.

Also applies to: 12805-12808

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12276 - 12295, listTaskQueueItems currently sorts ties
by newest-first (String(b.updatedAt).localeCompare(String(a.updatedAt))),
causing the runner (which takes queue[0]) to favor recent/retied tasks; change
the tie-breaker in the sort comparator in listTaskQueueItems to oldest-first so
FIFO is preserved (e.g., compare a.updatedAt to b.updatedAt ascending or use
numeric Date parsing like (Number(a.updatedAt) || 0) - (Number(b.updatedAt) ||
0)); keep the statusRank logic the same and ensure the comparator returns that
ascending time order so the runner's queue[0] is the oldest queued task.
🧹 Nitpick comments (1)
tests/unit/provider-share-command.test.mjs (1)

1167-1233: Consider splitting this into two focused tests for easier triage.

This case now checks both provider editability flags and model-shape mapping. A small split would make failures more diagnosable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/provider-share-command.test.mjs` around lines 1167 - 1233, The
test 'buildMcpProviderListPayload keeps regular providers editable' is doing two
checks at once—flags (readOnly, nonEditable, nonDeletable, current) and
model-shape mapping—making failures hard to triage; split it into two focused
tests that both call buildMcpProviderListPayload (instantiated via
instantiateFunction with readConfigOrVirtualDefault stub): one test should
assert provider presence and the boolean flags (readOnly, nonEditable,
nonDeletable, current) for the 'openai' provider, and the other should assert
the models array shape/deep equality (the gpt-5.3-codex object); keep the same
stubs and setup but move the corresponding assert.* lines into the respective
new tests so each test verifies a single concern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@cli.js`:
- Around line 12501-12521: The current captureLines implementation splits each
incoming chunk immediately, which loses JSON lines that span chunks; change it
to maintain a per-stream partial buffer (e.g., stdoutPartial and stderrPartial)
and on each data chunk prepend the leftover partial, then split into lines, keep
the final incomplete line as the new partial instead of trimming it away, and
only push/parse fully-terminated lines into stdoutLines/stderrLines/parsedEvents
and update sessionId via findCodexSessionId; apply the same
buffered/partial-line logic to the other identical block (the block referenced
at 12534-12538) and continue to use truncateTaskText for stored preview lines
and the 120/1200 limits as before.
- Around line 10926-10942: The code only stops awaiting runTaskPlanInternal when
options.detach is true but doesn't actually background the work; change the
detach path to spawn a detached child process (e.g., via child_process.fork or
spawn with { detached: true, stdio: 'ignore' }) that receives the prepared plan
and identifiers so runTaskPlanInternal runs in the child, call child.unref() and
ensure stdio are ignored so the parent can exit; preserve generation of
taskId/runId via createTaskId/createTaskRunId, serialize the plan from
coerceTaskPlanPayload/validatePreparedTaskPlan into the child’s args or env, and
keep the same JSON/console output behavior in the parent.
- Around line 10956-10969: The retry subcommand checks only result.error and can
still exit 0 for a retried task that finishes with result.status === 'failed';
update the 'retry' branch (the code that calls retryTaskRun with runId/detach
and handles result) to treat a failed run as an error: after handling
JSON/detached/summary output, if result.status === 'failed' (and not detached)
then either throw a new Error or call process.exit(1) (consistent with how 'task
run' handles failure) so the CLI returns a non-zero exit code for failed
retries; reference the retryTaskRun call, the result object, and existing
result.error handling to implement this.

In `@web-ui/modules/app.computed.session.mjs`:
- Around line 104-111: The current gate using hasKnownRate (computed from
Object.values(pricing).some((value) => value !== null && value > 0)) excludes
models with all-zero rates; change the condition in the hasKnownRate check to
allow zero as a valid known rate (e.g., use some(value => value !== null)) so
zero-priced providers are treated as estimable; locate the pricing object and
the hasKnownRate computation in web-ui/modules/app.computed.session.mjs
(functions/symbols: readUsageCostNumber, pricing, hasKnownRate,
formatUsageEstimatedCost) and update the predicate accordingly so that
formatUsageEstimatedCost can display $0.00 and such providers contribute to
coverage.
- Around line 153-183: The coverage numerator/denominator mismatch happens
because totalSessionTokens is taken only from session.totalTokens while
estimatedTokens sometimes sums per-field tokens; move the per-field token
calculations (inputTokens, cachedInputTokens, outputTokens,
reasoningOutputTokens, billableInputTokens) to before computing
totalSessionTokens and change totalSessionTokens (used to increment totalTokens)
to use session.totalTokens when present/finite otherwise fall back to the same
per-field sum you use for estimatedTokens (billableInputTokens +
cachedInputTokens + (outputTokens||0) + reasoningOutputTokens), keeping all
other logic (pricing check, estimatedTokens increment, coveragePercent) the
same; update references to totalSessionTokens, estimatedTokens, totalTokens,
inputTokens, outputTokens, reasoningOutputTokens, cachedInputTokens, and
billableInputTokens accordingly.

---

Duplicate comments:
In `@cli.js`:
- Around line 9693-9708: Detached starts always generate a new runId, ignoring a
caller-supplied runId; change the runId assignment so it uses the provided
params.runId when present and non-empty instead of always calling
createTaskRunId(). Locate the detached branch around the variables detach,
params, coerceTaskPlanPayload, validatePreparedTaskPlan, taskId (createTaskId)
and runId (createTaskRunId) and replace the runId initialization with logic that
uses params.runId.trim() when typeof params.runId === 'string' and non-empty,
otherwise falls back to createTaskRunId(), then pass that runId into
runTaskPlanInternal as before.
- Around line 12187-12190: In coerceTaskPlanPayload, when params.plan is an
object the function currently returns a clone of the plan and thus ignores
top-level CLI/API overrides; change it to clone the plan (using cloneJson) and
then shallow-merge any explicitly provided top-level flags from params (e.g.
allowWrite, dryRun, concurrency, autoFixRounds and other top-level CLI flags)
onto the cloned plan before returning so that explicit overrides in params take
precedence over values in the plan.
- Around line 12617-12624: The early return after validatePreparedTaskPlan(... )
and uncaught exceptions in executeTaskPlan(...) / onUpdate leave
options.queueItem stuck as non-terminal; change all early returns and all catch
paths to persist a terminal state for options.queueItem (e.g. mark as
failed/invalid) before returning, by calling the same persistence/update routine
used elsewhere (e.g. onUpdate or the queue persistence helper) and include the
validation.error/details; similarly, wrap executeTaskPlan/onUpdate calls with
try/catch that sets options.queueItem to a terminal status and persists it on
any non-success exit, while keeping the existing finally that removes the
controller so the item cannot remain queued or running indefinitely.
- Around line 12257-12267: upsertTaskQueueItem performs an unsafe
read-modify-write on TASK_QUEUE_FILE (via readTaskQueueState,
normalizeTaskQueueItem, writeTaskQueueState) allowing races across processes;
change it to acquire an exclusive file lock/lease around the operation (or use
an atomic transactional backend) so the sequence is: acquire lock on
TASK_QUEUE_FILE, re-read state, apply normalizeTaskQueueItem and upsert into
state.tasks, write the updated state atomically (write temp + rename or use
atomic write APIs), then release the lock; also ensure the runner that picks
nextItem persists a durable claim under the same lock/transaction before
starting the task.
- Around line 12276-12295: listTaskQueueItems currently sorts ties by
newest-first (String(b.updatedAt).localeCompare(String(a.updatedAt))), causing
the runner (which takes queue[0]) to favor recent/retied tasks; change the
tie-breaker in the sort comparator in listTaskQueueItems to oldest-first so FIFO
is preserved (e.g., compare a.updatedAt to b.updatedAt ascending or use numeric
Date parsing like (Number(a.updatedAt) || 0) - (Number(b.updatedAt) || 0)); keep
the statusRank logic the same and ensure the comparator returns that ascending
time order so the runner's queue[0] is the oldest queued task.

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 259: Replace the inline complex expression in the <pre> (which maps
node.logs) with a simple binding to a formatter method; add a method (e.g.,
formatTaskNodeLogs(logs)) in the component's methods or computed that returns
'(no logs)' for non-array/empty input or maps each log to `${item.at || ''}
${item.level || ''} ${item.message || ''}`.trim() and joins with '\n', then
update the template to use {{ formatTaskNodeLogs(node.logs) }} so all formatting
is done outside the template and the template only binds the final string.

---

Nitpick comments:
In `@tests/unit/provider-share-command.test.mjs`:
- Around line 1167-1233: The test 'buildMcpProviderListPayload keeps regular
providers editable' is doing two checks at once—flags (readOnly, nonEditable,
nonDeletable, current) and model-shape mapping—making failures hard to triage;
split it into two focused tests that both call buildMcpProviderListPayload
(instantiated via instantiateFunction with readConfigOrVirtualDefault stub): one
test should assert provider presence and the boolean flags (readOnly,
nonEditable, nonDeletable, current) for the 'openai' provider, and the other
should assert the models array shape/deep equality (the gpt-5.3-codex object);
keep the same stubs and setup but move the corresponding assert.* lines into the
respective new tests so each test verifies a single concern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aa2cd7f4-acd8-4611-b59d-e3f88b62cc15

📥 Commits

Reviewing files that changed from the base of the PR and between 67b74ab and 8b58f5a.

📒 Files selected for processing (6)
  • cli.js
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/provider-share-command.test.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (2)
  • tests/unit/config-tabs-ui.test.mjs
  • web-ui/styles/task-orchestration.css
📜 Review details
🧰 Additional context used
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 259-259: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (1)
tests/unit/provider-share-command.test.mjs (1)

1185-1199: Good coverage for model metadata passthrough.

This update correctly validates that buildMcpProviderListPayload() preserves expected models fields (id, name, cost, contextWindow, maxTokens) for regular providers.

Also applies to: 1219-1232

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
web-ui/modules/app.computed.session.mjs (2)

120-127: ⚠️ Potential issue | 🟡 Minor

Treat explicit zero-cost pricing as configured.

Line 126 still drops models whose configured rates are all 0. That makes free/self-hosted providers show no estimate, and for cataloged model IDs it can even fall through to KNOWN_USAGE_MODEL_PRICING and display a positive cost instead of $0.00.

💡 Proposed fix
             const pricing = createUsagePricingEntry(
                 model.cost && typeof model.cost === 'object' && !Array.isArray(model.cost)
                     ? model.cost
                     : null,
                 'provider-config'
             );
-            const hasKnownRate = Object.values(pricing).some((value) => value !== null && value > 0);
+            const hasKnownRate = [pricing.input, pricing.output, pricing.cacheRead, pricing.cacheWrite]
+                .some((value) => value !== null);
             if (!hasKnownRate) continue;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 120 - 127, The check
that decides whether a model has configured rates currently uses hasKnownRate =
Object.values(pricing).some(value => value !== null && value > 0), which treats
explicit zero rates as unconfigured; update the logic in the block around
createUsagePricingEntry and hasKnownRate so that any non-null value (including
0) counts as configured (e.g., use some(value => value !== null)); keep the
continue only when all pricing entries are null so zero-cost providers are
preserved and avoid falling back to KNOWN_USAGE_MODEL_PRICING.

182-217: ⚠️ Potential issue | 🟡 Minor

Use the same token basis for both coverage totals.

Lines 185-217 let estimatedTokens fall back to the per-field token breakdown, but totalTokens still ignores sessions without session.totalTokens. The tooltip can therefore report coverage above 100%, or 100% against a zero denominator.

💡 Proposed fix
     for (const session of list) {
         if (!session || typeof session !== 'object') continue;
-        const pricing = resolveUsagePricingForSession(session, pricingIndex, currentProvider);
-        const totalSessionTokens = Number.isFinite(Number(session.totalTokens))
-            ? Math.max(0, Math.floor(Number(session.totalTokens)))
-            : 0;
-        totalTokens += totalSessionTokens;
-        if (!pricing) {
-            continue;
-        }
         const inputTokens = Number.isFinite(Number(session.inputTokens)) ? Math.max(0, Math.floor(Number(session.inputTokens))) : null;
         const cachedInputTokens = Number.isFinite(Number(session.cachedInputTokens)) ? Math.max(0, Math.floor(Number(session.cachedInputTokens))) : 0;
         const outputTokens = Number.isFinite(Number(session.outputTokens)) ? Math.max(0, Math.floor(Number(session.outputTokens))) : null;
         const reasoningOutputTokens = Number.isFinite(Number(session.reasoningOutputTokens)) ? Math.max(0, Math.floor(Number(session.reasoningOutputTokens))) : 0;
+        const fallbackSessionTokens =
+            Math.max(0, (inputTokens || 0) - cachedInputTokens)
+            + cachedInputTokens
+            + (outputTokens || 0)
+            + reasoningOutputTokens;
+        const totalSessionTokens = Number.isFinite(Number(session.totalTokens))
+            ? Math.max(0, Math.floor(Number(session.totalTokens)))
+            : fallbackSessionTokens;
+        totalTokens += totalSessionTokens;
+        const pricing = resolveUsagePricingForSession(session, pricingIndex, currentProvider);
+        if (!pricing) {
+            continue;
+        }
         if (inputTokens === null && outputTokens === null && reasoningOutputTokens === 0) {
             continue;
         }
         const billableInputTokens = Math.max(0, (inputTokens || 0) - cachedInputTokens);
@@
-        estimatedTokens += totalSessionTokens || (billableInputTokens + cachedInputTokens + (outputTokens || 0) + reasoningOutputTokens);
+        estimatedTokens += totalSessionTokens;
     }

     const coveragePercent = totalTokens > 0
-        ? Math.round((estimatedTokens / totalTokens) * 100)
+        ? Math.min(100, Math.round((estimatedTokens / totalTokens) * 100))
         : (estimatedSessions > 0 ? 100 : 0);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 182 - 217, The coverage
calculation mixes token bases: totalTokens only adds totalSessionTokens (from
session.totalTokens) while estimatedTokens falls back to per-field sums,
allowing coverage >100% or 100%/0; fix by when totalSessionTokens is 0 (or
missing) add the same per-field breakdown used for estimatedTokens to
totalTokens (use billableInputTokens + cachedInputTokens + (outputTokens || 0) +
reasoningOutputTokens), ensuring totalTokens and estimatedTokens use the same
token basis; update the logic around totalSessionTokens, estimatedTokens, and
totalTokens (and reference variables totalSessionTokens, totalTokens,
estimatedTokens, inputTokens, cachedInputTokens, outputTokens,
reasoningOutputTokens, billableInputTokens, and pricing) so sessions without
session.totalTokens contribute equally to both totals.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@web-ui/modules/app.computed.session.mjs`:
- Around line 120-127: The check that decides whether a model has configured
rates currently uses hasKnownRate = Object.values(pricing).some(value => value
!== null && value > 0), which treats explicit zero rates as unconfigured; update
the logic in the block around createUsagePricingEntry and hasKnownRate so that
any non-null value (including 0) counts as configured (e.g., use some(value =>
value !== null)); keep the continue only when all pricing entries are null so
zero-cost providers are preserved and avoid falling back to
KNOWN_USAGE_MODEL_PRICING.
- Around line 182-217: The coverage calculation mixes token bases: totalTokens
only adds totalSessionTokens (from session.totalTokens) while estimatedTokens
falls back to per-field sums, allowing coverage >100% or 100%/0; fix by when
totalSessionTokens is 0 (or missing) add the same per-field breakdown used for
estimatedTokens to totalTokens (use billableInputTokens + cachedInputTokens +
(outputTokens || 0) + reasoningOutputTokens), ensuring totalTokens and
estimatedTokens use the same token basis; update the logic around
totalSessionTokens, estimatedTokens, and totalTokens (and reference variables
totalSessionTokens, totalTokens, estimatedTokens, inputTokens,
cachedInputTokens, outputTokens, reasoningOutputTokens, billableInputTokens, and
pricing) so sessions without session.totalTokens contribute equally to both
totals.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4529ead8-be33-4876-a4dd-79812947b803

📥 Commits

Reviewing files that changed from the base of the PR and between 8b58f5a and c9ecfa0.

📒 Files selected for processing (4)
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (1)
  • web-ui/styles/task-orchestration.css
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/config-tabs-ui.test.mjs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
tests/unit/config-tabs-ui.test.mjs (1)

71-80: ⚠️ Potential issue | 🟡 Minor

Assert orchestration handlers/state on bundled html as well, not only orchestrationPanel.

These checks can still pass even if bundling stops including updated panel content. Mirror these assertions against html to catch bundle drift.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/config-tabs-ui.test.mjs` around lines 71 - 80, The tests currently
only assert orchestration-related strings against orchestrationPanel, which can
miss bundle drift; duplicate the same assertions to also check the bundled
output variable html by adding matching assertions for each existing
orchestrationPanel check (e.g. mirror the assert.match(orchestrationPanel,
/.../) lines to assert.match(html, /.../)), referencing the same handlers/state
patterns such as previewTaskPlan(), runTaskOrchestration(),
addTaskOrchestrationToQueue(), startTaskQueueRunner(),
retryTaskRunFromUi(taskOrchestration.selectedRunId), the CSS classes
("settings-tab-actions task-header-actions", "btn-tool task-action-preview" with
`@click`="previewTaskPlan()", "task-action-row-right") and the token
taskOrchestrationSelectedRunNodes so both orchestrationPanel and html are
validated.
web-ui/partials/index/panel-orchestration.html (1)

261-261: ⚠️ Potential issue | 🟡 Minor

Move node log formatting out of the template expression.

This inline map(... => ...) expression is still brittle and continues to trigger HTML linting noise on the > token. Extract it into a formatter method/computed and render only the formatted string here.

🔧 Suggested template-side change
- <pre class="task-log-block">{{ Array.isArray(node.logs) && node.logs.length ? node.logs.map((item) => `${item.at || ''} ${item.level || ''} ${item.message || ''}`.trim()).join('\n') : '(no logs)' }}</pre>
+ <pre class="task-log-block">{{ formatTaskNodeLogs(node.logs) }}</pre>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` at line 261, The inline
map/string logic for node logs should be moved out of the template into a
formatter function or computed property to avoid lint noise; implement a
function (e.g., formatNodeLogs(node) or nodeLogsText(node)) that checks
Array.isArray(node.logs) && node.logs.length, maps each item to `${item.at ||
''} ${item.level || ''} ${item.message || ''}`.trim(), joins with '\n' and
returns '(no logs)' otherwise, then replace the inline expression in the <pre>
with a simple interpolation like {{ formatNodeLogs(node) }} (also guard against
node being undefined).
🧹 Nitpick comments (2)
tests/unit/config-tabs-ui.test.mjs (1)

20-20: Prefer bundled CSS assertions for orchestration style drift checks.

Using readProjectFile('web-ui/styles/task-orchestration.css') validates source only. Consider asserting these selectors against readBundledWebUiCss() too so bundling regressions are caught.

Also applies to: 81-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/config-tabs-ui.test.mjs` at line 20, The test currently validates
orchestration styles by calling
readProjectFile('web-ui/styles/task-orchestration.css') which only checks source
CSS; update the test to also assert the same selectors against the bundled CSS
by calling readBundledWebUiCss() and running the same selector/expected-value
assertions (the same logic that inspects task-orchestration.css) against the
bundled output; locate the assertions around the readProjectFile usage and
duplicate/apply them to the readBundledWebUiCss() result so bundling regressions
are caught (also apply the same change to the other occurrences noted around the
later assertions).
web-ui/partials/index/panel-orchestration.html (1)

227-227: Predeclare selectedRunError in the orchestration state.

The template reads taskOrchestration.selectedRunError, but the initialized state in web-ui/app.js:350-378 doesn’t include this key. Add it to the initial state shape to keep state predictable.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` at line 227, Add a new key
selectedRunError to the initial taskOrchestration state so the template binding
taskOrchestration.selectedRunError is always defined; locate the
taskOrchestration object in the initial state setup in web-ui/app.js (the state
initializer around the taskOrchestration declaration) and add selectedRunError:
null (or empty string) to the shape so the component won’t read an undefined
property.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/unit/config-tabs-ui.test.mjs`:
- Around line 71-80: The tests currently only assert orchestration-related
strings against orchestrationPanel, which can miss bundle drift; duplicate the
same assertions to also check the bundled output variable html by adding
matching assertions for each existing orchestrationPanel check (e.g. mirror the
assert.match(orchestrationPanel, /.../) lines to assert.match(html, /.../)),
referencing the same handlers/state patterns such as previewTaskPlan(),
runTaskOrchestration(), addTaskOrchestrationToQueue(), startTaskQueueRunner(),
retryTaskRunFromUi(taskOrchestration.selectedRunId), the CSS classes
("settings-tab-actions task-header-actions", "btn-tool task-action-preview" with
`@click`="previewTaskPlan()", "task-action-row-right") and the token
taskOrchestrationSelectedRunNodes so both orchestrationPanel and html are
validated.

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 261: The inline map/string logic for node logs should be moved out of the
template into a formatter function or computed property to avoid lint noise;
implement a function (e.g., formatNodeLogs(node) or nodeLogsText(node)) that
checks Array.isArray(node.logs) && node.logs.length, maps each item to
`${item.at || ''} ${item.level || ''} ${item.message || ''}`.trim(), joins with
'\n' and returns '(no logs)' otherwise, then replace the inline expression in
the <pre> with a simple interpolation like {{ formatNodeLogs(node) }} (also
guard against node being undefined).

---

Nitpick comments:
In `@tests/unit/config-tabs-ui.test.mjs`:
- Line 20: The test currently validates orchestration styles by calling
readProjectFile('web-ui/styles/task-orchestration.css') which only checks source
CSS; update the test to also assert the same selectors against the bundled CSS
by calling readBundledWebUiCss() and running the same selector/expected-value
assertions (the same logic that inspects task-orchestration.css) against the
bundled output; locate the assertions around the readProjectFile usage and
duplicate/apply them to the readBundledWebUiCss() result so bundling regressions
are caught (also apply the same change to the other occurrences noted around the
later assertions).

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 227: Add a new key selectedRunError to the initial taskOrchestration
state so the template binding taskOrchestration.selectedRunError is always
defined; locate the taskOrchestration object in the initial state setup in
web-ui/app.js (the state initializer around the taskOrchestration declaration)
and add selectedRunError: null (or empty string) to the shape so the component
won’t read an undefined property.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ffa4461a-2efa-4107-811a-d04d161668f9

📥 Commits

Reviewing files that changed from the base of the PR and between 41ce3d7 and ef446ee.

📒 Files selected for processing (3)
  • tests/unit/config-tabs-ui.test.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (1)
  • web-ui/styles/task-orchestration.css
📜 Review details
🧰 Additional context used
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 261-261: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (2)
web-ui/partials/index/panel-orchestration.html (1)

157-166: Keyboard interaction wiring for selectable queue rows looks good.

role, tabindex, and Enter/Space handlers are consistently applied for clickable queue items.

tests/unit/config-tabs-ui.test.mjs (1)

62-70: Good coverage for orchestration tab wiring in bundled HTML.

The checks around tab IDs, ARIA linkage, and panel visibility binding are comprehensive and useful.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@web-ui/logic.sessions.mjs`:
- Around line 149-153: The current code spreads validDayStarts into
Math.min/Math.max which can hit argument-count limits for large histories;
replace those spreads with safe iterative reduction (e.g., compute min via
validDayStarts.reduce((min, v) => v < min ? v : min, Infinity) and max via
reduce with -Infinity) or a simple for-loop over normalizedSessions using
toUtcDayStartMs to update min/max, and keep the existing fallback to
toUtcDayStartMs(now) when validDayStarts is empty; update the assignments to
firstDayStart and lastDayStart accordingly (references: validDayStarts,
firstDayStart, lastDayStart, toUtcDayStartMs, normalizedSessions, now).

In `@web-ui/partials/index/panel-orchestration.html`:
- Around line 203-212: The parent row's keyboard handlers
(`@keydown.enter.prevent` and `@keydown.space.prevent` on the v-for container / task
runtime row) are catching events bubbled from the nested "取消" button; update
those handlers to be scoped to the row itself by adding the .self modifier
(e.g., use `@keydown.enter.self.prevent` and `@keydown.space.self.prevent`) so
Enter/Space pressed on child controls don't trigger selectTaskRun, and audit any
other row-level key handlers to apply .self where appropriate while leaving the
nested button's native keyboard behavior intact.
- Around line 36-38: The quick-template buttons leave engine-specific draft
state (taskOrchestration.selectedEngine and taskOrchestration.workflowIdsText)
set by previous clicks; update the non-workflow templates (the handlers that set
taskOrchestration.target/notes/followUpsText) to explicitly clear
engine-specific fields (set taskOrchestration.selectedEngine = '' and
taskOrchestration.workflowIdsText = '') and ensure the workflow template sets
taskOrchestration.selectedEngine = 'workflow' and
taskOrchestration.workflowIdsText as needed so stale workflow IDs or engine
selections cannot persist between template clicks.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c02c4e35-4ea9-4fbd-92e0-934ff78c5558

📥 Commits

Reviewing files that changed from the base of the PR and between ef446ee and bea099f.

📒 Files selected for processing (8)
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/logic.sessions.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/partials/index/panel-usage.html
  • web-ui/styles/sessions-usage.css
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (2)
  • web-ui/styles/sessions-usage.css
  • web-ui/styles/task-orchestration.css
🚧 Files skipped from review as they are similar to previous changes (1)
  • web-ui/partials/index/panel-usage.html
📜 Review details
🧰 Additional context used
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 313-313: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (4)
web-ui/partials/index/panel-orchestration.html (1)

313-313: Move the log formatting out of the template expression.

Line 313 still embeds the full node.logs.map(...) formatter in markup. This is the same unresolved HTML/template issue flagged earlier and it still keeps too much logic in the template.

web-ui/modules/app.computed.session.mjs (2)

169-170: Treat zero-priced models as estimable.

Line 169 still filters out pricing entries whose configured rates are all 0, so free/local models fall back to “暂无” instead of $0.00.


230-264: Keep coverage numerator and denominator on the same token basis.

Line 233 only adds session.totalTokens into totalTokens, but Line 254 can add a per-field fallback into estimatedTokens. For sessions missing totalTokens, the coverage note/title can still exceed 100% or report full coverage against a zero denominator.

tests/unit/config-tabs-ui.test.mjs (1)

71-85: Assert the orchestration wiring against bundled HTML too.

These checks still only validate the source partial, so they can miss a bundling regression where panel-orchestration.html stops being included in the shipped html.

@awsl233777 awsl233777 changed the title feat: add task orchestration tab and task CLI feat: add task orchestration and usage insights Apr 12, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
web-ui/modules/app.computed.main-tabs.mjs (1)

171-179: Consider single-pass iteration for queue stats.

taskOrchestrationQueueStats iterates through the queue array three times (once per status). For typical queue sizes this is fine, but a single-pass reduction would be more efficient.

♻️ Optional: Single-pass queue stats
 taskOrchestrationQueueStats() {
     const queue = this.taskOrchestration && Array.isArray(this.taskOrchestration.queue)
         ? this.taskOrchestration.queue
         : [];
-    return {
-        queued: queue.filter((item) => String(item && item.status || '').trim().toLowerCase() === 'queued').length,
-        running: queue.filter((item) => String(item && item.status || '').trim().toLowerCase() === 'running').length,
-        failed: queue.filter((item) => String(item && item.status || '').trim().toLowerCase() === 'failed').length
-    };
+    const stats = { queued: 0, running: 0, failed: 0 };
+    for (const item of queue) {
+        const status = String(item && item.status || '').trim().toLowerCase();
+        if (status === 'queued') stats.queued++;
+        else if (status === 'running') stats.running++;
+        else if (status === 'failed') stats.failed++;
+    }
+    return stats;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.main-tabs.mjs` around lines 171 - 179,
taskOrchestrationQueueStats currently iterates the queue three times to count
queued/running/failed; change it to a single-pass reduce over the queue variable
(from taskOrchestrationQueueStats) that normalizes each item's status
(String(item && item.status || '').trim().toLowerCase()) and increments counters
for queued, running, failed in one sweep, then return the aggregated object with
those counts; ensure handling of missing or non-array
this.taskOrchestration.queue remains the same.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@web-ui/modules/app.computed.main-tabs.mjs`:
- Around line 171-179: taskOrchestrationQueueStats currently iterates the queue
three times to count queued/running/failed; change it to a single-pass reduce
over the queue variable (from taskOrchestrationQueueStats) that normalizes each
item's status (String(item && item.status || '').trim().toLowerCase()) and
increments counters for queued, running, failed in one sweep, then return the
aggregated object with those counts; ensure handling of missing or non-array
this.taskOrchestration.queue remains the same.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 537a9f5b-50b2-4568-a47d-2edae9e0527e

📥 Commits

Reviewing files that changed from the base of the PR and between bea099f and 980f74c.

📒 Files selected for processing (11)
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/web-ui-behavior-parity.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/app.js
  • web-ui/logic.sessions.mjs
  • web-ui/modules/app.computed.main-tabs.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/partials/index/panel-usage.html
  • web-ui/styles/sessions-usage.css
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (1)
  • web-ui/styles/task-orchestration.css
🚧 Files skipped from review as they are similar to previous changes (4)
  • tests/unit/web-ui-behavior-parity.test.mjs
  • web-ui/logic.sessions.mjs
  • web-ui/styles/sessions-usage.css
  • tests/unit/config-tabs-ui.test.mjs
📜 Review details
🧰 Additional context used
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 406-406: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (28)
web-ui/partials/index/panel-orchestration.html (7)

1-6: Static analysis false positive: doctype-first error.

This is a partial HTML template that gets included into a parent layout file, so the doctype error from HTMLHint is expected and can be safely ignored.


57-59: Quick template buttons should reset engine-specific state.

The quick template buttons at lines 57-58 don't reset selectedEngine or workflowIdsText, which can lead to stale state if the user previously clicked the "Workflow 批处理" template. The "Workflow 批处理" template (line 59) correctly sets engine to workflow, but the other templates should explicitly reset to codex engine.


309-318: Queue item keyboard handlers should use .self modifier.

The @keydown.enter.prevent and @keydown.space.prevent handlers on the queue item container (lines 317-318) will capture keyboard events that bubble up from the nested "取消" button at line 326. This can cause unintended navigation when the user presses Enter/Space on the cancel button. Adding the .self modifier ensures only direct events on the container trigger navigation.


406-406: Template expression with > character flagged by static analysis.

The inline log formatting expression uses > within the template, which HTMLHint flags. While Vue handles this correctly at runtime, consider extracting the log formatting logic to a method for cleaner templates and broader compatibility.


43-178: LGTM: Task compose flow structure is well-organized.

The three-step flow (write target → select strategy → preview/execute) provides clear user guidance. The form bindings, validation states, and action button disabled conditions are properly wired to the taskOrchestration state.


180-219: LGTM: Draft status sidebar provides helpful feedback.

The readiness card and checklist properly consume the computed properties taskOrchestrationDraftReadiness and taskOrchestrationDraftChecklist, giving users clear visibility into draft state.


281-411: LGTM: Workbench tabs and run detail view are well-implemented.

The queue/runs/detail tabs with proper conditional rendering, status pills, and action buttons provide a complete execution monitoring interface. The disabled conditions on retry/cancel buttons correctly check isTaskRunActive().

web-ui/app.js (2)

350-380: State schema inconsistency: missing selectedRunError and detailRequestToken fields.

The taskOrchestration object in data() is missing selectedRunError and detailRequestToken fields that are present in createDefaultTaskOrchestrationState() (lines 29-30 in app.methods.task-orchestration.mjs). While Vue 3's Proxy-based reactivity handles dynamic property additions, maintaining a consistent schema improves code clarity and prevents potential issues.


508-508: LGTM: Polling cleanup in beforeUnmount.

Calling stopTaskOrchestrationPolling() during component teardown properly cleans up the polling timer, preventing memory leaks and stale callbacks.

web-ui/modules/app.methods.task-orchestration.mjs (8)

68-75: State schema mismatch: existing state may lack new fields.

ensureTaskOrchestrationState() returns the existing taskOrchestration object unchanged if it exists. If the initial state in web-ui/app.js is missing fields like selectedRunError or detailRequestToken, they won't be added when accessing via this method. Consider merging defaults into existing state.


140-143: Overview warnings may overwrite plan-specific warnings.

Line 143 assigns res.warnings from task-overview to state.planWarnings, which is also used by previewTaskPlan(). This can cause catalog/overview warnings to replace plan-specific warnings. Consider using a separate overviewWarnings field.


194-199: Clear stale preview data on plan generation failure.

The catch block returns an error but leaves the previous plan, planIssues, and planWarnings in state. This can cause the UI to show an outdated preview after a failed refresh.


268-275: Differentiate "already running" from fresh queue start.

task-queue-start can return { started: false, alreadyRunning: true }, but line 275 always shows "队列执行器已启动". The message should reflect whether the queue was newly started or was already running.


1-34: LGTM: Default state factory is comprehensive.

All state fields including selectedRunError and detailRequestToken are properly defined with appropriate default values.


286-338: LGTM: Request token mechanism prevents stale response overwrites.

The detailRequestToken pattern correctly handles concurrent requests by:

  1. Incrementing token before request
  2. Checking token match before updating state
  3. Only clearing loading state if token matches

This addresses the race condition concern from past reviews.


421-436: LGTM: Polling control is well-implemented.

The polling logic properly:

  • Stops existing timer before scheduling new one
  • Checks tab visibility before scheduling
  • Only polls when there's live activity
  • Re-checks tab after timeout fires

438-455: LGTM: Draft reset clears all input fields.

resetTaskOrchestrationDraft() properly resets all draft-related fields to their defaults, including clearing the plan and synchronizing polling state.

web-ui/partials/index/panel-usage.html (3)

15-15: LGTM: "全部" time range button added correctly.

The button follows the same pattern as the existing time range buttons, properly setting sessionsUsageTimeRange = 'all' and using the active class binding.


28-28: LGTM: Card note display conditional rendering.

The v-if="card.note" pattern correctly shows the note only when present, aligning with the sessionUsageSummaryCards computed property that conditionally includes the note field for cost cards.


86-128: LGTM: Model coverage card implementation.

The new model usage card properly:

  • Displays coverage metrics from sessionUsageCharts.modelCoverage
  • Shows empty state diagnostic when no models are found
  • Renders model chips with counts and source labels
  • Handles missing model sessions appropriately

The data bindings match the structure returned by buildUsageChartGroups() in logic.sessions.mjs.

tests/unit/web-ui-logic.test.mjs (4)

12-17: LGTM: New imports for orchestration testing.

The imports correctly bring in createMainTabsComputed and createTaskOrchestrationMethods factories for testing the new orchestration features.


747-811: LGTM: buildUsageChartGroups model coverage test is thorough.

The test verifies:

  • Model filtering by date range (7d)
  • Source labels assignment
  • Model coverage metrics calculation
  • Missing model tracking per source

Good edge case coverage with mixed model field names (modelId, modelName, model) and sessions without model info.


1219-1272: LGTM: Draft readiness tests cover key scenarios.

Tests properly verify:

  • Missing workflow IDs warning tone
  • Preview checklist state
  • Ready-to-execute success state with plan nodes

1301-1336: LGTM: loadTaskRunDetail stale response test validates critical behavior.

This test correctly verifies that when two concurrent requests are made, the second request's response is kept while the first (stale) response is ignored. This confirms the detailRequestToken mechanism works as intended to prevent race conditions.

web-ui/modules/app.computed.main-tabs.mjs (4)

1-40: LGTM: Draft metrics extraction is robust.

The readTaskOrchestrationDraftMetrics function properly normalizes input, handles missing/invalid state, and extracts all relevant draft indicators. Good defensive coding with null checks and type coercion.


42-78: LGTM: Draft checklist provides clear user guidance.

The checklist items cover the key steps (target, engine/workflow, scope, preview) with contextual detail messages that adapt to the current state.


80-130: LGTM: Readiness state machine is well-designed.

The priority ordering (no target → missing workflow → no plan → issues → warnings → dryRun → ready) provides appropriate guidance at each stage of draft completion.


166-170: No changes needed. The taskOrchestrationSelectedRunNodes method correctly accesses run.nodes where run is obtained from detail.run. This structure is confirmed by readTaskRunDetail() in cli.js (which returns the parsed JSON data) and validated by collectTaskRunSummary() which explicitly accesses the same detail.run.nodes path (line 12363 in cli.js).

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! I'll kick off a fresh review of PR #86 now.


🧠 Learnings used
Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! Triggering a fresh full review of PR #86 now.


🧠 Learnings used
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-05T03:58:26.787Z
Learning: For SakuraByteCore/codexmate reviews, only provide non-breaking feedback (correctness, safety, quality). Avoid API, behavior, or refactor suggestions. This rule applies across PRs `#55`, `#59`, `#67`, and `#68`.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
tests/unit/session-usage-backend.test.mjs (1)

326-329: ⚠️ Potential issue | 🟡 Minor

Guard both toIsoTime stubs against invalid dates.

Line 326 and Line 458 call new Date(value).toISOString() directly, which throws RangeError on bad input. That means these harnesses won't return the fallback at all, and the new parser tests become brittle whenever a fixture or code path surfaces an invalid timestamp.

🐛 Proposed fix
             toIsoTime(value, fallback = '') {
-                const iso = new Date(value).toISOString();
-                return iso === 'Invalid Date' ? fallback : iso;
+                try {
+                    const date = new Date(value);
+                    if (!Number.isFinite(date.getTime())) return fallback;
+                    return date.toISOString();
+                } catch {
+                    return fallback;
+                }
             },

Also applies to: 458-461

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/session-usage-backend.test.mjs` around lines 326 - 329, The
toIsoTime stub(s) call new Date(value).toISOString() which throws RangeError for
invalid inputs; update the toIsoTime function (and the duplicate stub later) to
guard against invalid dates by either checking isNaN(new Date(value).getTime())
before calling toISOString or wrapping the toISOString call in a try/catch and
returning the fallback on error, ensuring any bad input returns the fallback
instead of throwing.
🧹 Nitpick comments (1)
web-ui/partials/index/panel-orchestration.html (1)

406-406: Consider extracting log formatting to a method.

The inline template expression contains complex logic. While Vue handles this correctly, moving it to a method like formatNodeLogs(node.logs) would improve readability and make the template cleaner.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` at line 406, Extract the
inline log formatting from the template into a component method called
formatNodeLogs(logs) and replace the current expression with a simple call like
{{ formatNodeLogs(node.logs) }}; implement formatNodeLogs(logs) to check
Array.isArray(logs) and logs.length, map each item to `${item.at || ''}
${item.level || ''} ${item.message || ''}`.trim(), join with '\n', and return
'(no logs)' when empty or not an array, ensuring you reference node.logs in the
template and the method name formatNodeLogs in your component methods.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@tests/unit/session-usage-backend.test.mjs`:
- Around line 326-329: The toIsoTime stub(s) call new Date(value).toISOString()
which throws RangeError for invalid inputs; update the toIsoTime function (and
the duplicate stub later) to guard against invalid dates by either checking
isNaN(new Date(value).getTime()) before calling toISOString or wrapping the
toISOString call in a try/catch and returning the fallback on error, ensuring
any bad input returns the fallback instead of throwing.

---

Nitpick comments:
In `@web-ui/partials/index/panel-orchestration.html`:
- Line 406: Extract the inline log formatting from the template into a component
method called formatNodeLogs(logs) and replace the current expression with a
simple call like {{ formatNodeLogs(node.logs) }}; implement formatNodeLogs(logs)
to check Array.isArray(logs) and logs.length, map each item to `${item.at || ''}
${item.level || ''} ${item.message || ''}`.trim(), join with '\n', and return
'(no logs)' when empty or not an array, ensuring you reference node.logs in the
template and the method name formatNodeLogs in your component methods.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 0cf38216-5202-4159-871d-78162be4ab69

📥 Commits

Reviewing files that changed from the base of the PR and between 980f74c and 43c8f14.

📒 Files selected for processing (10)
  • cli.js
  • tests/e2e/test-sessions.js
  • tests/e2e/test-setup.js
  • tests/unit/session-usage-backend.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/app.js
  • web-ui/logic.sessions.mjs
  • web-ui/modules/app.computed.main-tabs.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/panel-orchestration.html
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/web-ui-logic.test.mjs
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 406-406: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (30)
tests/e2e/test-sessions.js (1)

44-48: Good coverage for model-field preservation in usage listings.

This is a solid assertion set to prevent regressions for both Codex and Claude session model metadata.

tests/e2e/test-setup.js (2)

91-102: Fixture enrichment looks correct for Codex model extraction paths.

Including turn_context with both direct and nested model fields gives strong e2e coverage for metadata parsing.


198-198: Claude fixture model assignment is a good addition.

This keeps setup data consistent with the model assertions added in session usage tests.

web-ui/logic.sessions.mjs (7)

169-174: Avoid spreading the full session history into Math.min / Math.max.

For range === 'all', Lines 173-174 spread validDayStarts into function calls. Large histories can exceed JavaScript's argument-count limits and throw a RangeError.

🔧 Suggested fix
-        const firstDayStart = validDayStarts.length ? Math.min(...validDayStarts) : toUtcDayStartMs(now);
-        const lastDayStart = validDayStarts.length ? Math.max(...validDayStarts) : toUtcDayStartMs(now);
+        const firstDayStart = validDayStarts.length
+            ? validDayStarts.reduce((min, v) => v < min ? v : min, validDayStarts[0])
+            : toUtcDayStartMs(now);
+        const lastDayStart = validDayStarts.length
+            ? validDayStarts.reduce((max, v) => v > max ? v : max, validDayStarts[0])
+            : toUtcDayStartMs(now);

49-68: LGTM!

The collectSessionModelNames function correctly handles multiple model identifier sources (array models, plus legacy model, modelName, modelId fields), deduplicates them, and returns a clean array.


145-161: LGTM!

The new helper functions normalizeUsageRange, toUtcDayStartMs, and formatUtcDayKey properly handle UTC date operations and provide consistent bucket key formatting.


205-227: LGTM!

The session normalization preprocessing correctly filters by source, validates timestamps, derives session start/end times, and precomputes bucket keys for efficient aggregation.


309-327: LGTM!

The model aggregation logic correctly tracks per-model counts, message totals, token totals, and source labels while handling sessions without model metadata via missingModelSessions counter.


377-403: LGTM!

The usedModels and modelCoverage computation correctly sorts models by count and token total, derives source labels, and calculates coverage percentage.


420-458: LGTM!

The totalDurationMs calculation with proper Number.isFinite checks and the expanded return value including filteredSessions, usedModels, modelCoverage, activeDurationMs, and totalDurationMs are well-structured.

web-ui/modules/app.computed.main-tabs.mjs (4)

1-40: LGTM!

The helper functions normalizeTaskDraftLines and readTaskOrchestrationDraftMetrics are well-implemented with proper defensive checks for null/undefined inputs and consistent type coercion.


42-78: LGTM!

The checklist logic correctly derives readiness states from metrics. The conditions for workflowReady, scopeReady, and previewReady align with the expected UX behavior described in the PR.


80-130: LGTM!

The readiness determination follows a clear priority order with appropriate tone values (neutral, warn, error, success). Each condition correctly maps to user-facing guidance.


132-194: LGTM!

The createMainTabsComputed function properly exposes computed properties for the Vue component. The tab label/title/subtitle mappings are consistent, and the orchestration-related computed properties correctly delegate to the helper functions and handle null states gracefully.

web-ui/partials/index/panel-orchestration.html (5)

1-41: LGTM!

The hero section correctly binds to orchestration state and computed queue stats. Button disabled states properly check loading flags.


54-61: LGTM!

The quick template buttons now correctly reset selectedEngine and workflowIdsText to prevent stale state from previous template selections. This addresses the previously flagged issue.


156-177: LGTM!

The action buttons have appropriate disabled conditions checking multiple loading states (planning, running, queueAdding, queueStarting) and target validation.


309-330: LGTM!

The queue item row now uses .self modifier on keyboard handlers to prevent event bubbling from nested buttons. The conditional role/tabindex/aria-disabled attributes correctly handle items without lastRunId.


356-367: The cancel button's disabled logic correctly handles all edge cases.

The implementation of isTaskRunActive (line 104) wraps isActiveStatus, which properly handles falsy values. When status is undefined or null, the function normalizes it to an empty string and returns false since the empty string doesn't match 'running' or 'queued'. The disabled condition !taskOrchestrationSelectedRun || !taskOrchestrationSelectedRun.run || !isTaskRunActive(taskOrchestrationSelectedRun.run.status) correctly ensures the button remains disabled in all edge cases.

web-ui/app.js (2)

350-383: LGTM!

The taskOrchestration state object now includes all fields matching createDefaultTaskOrchestrationState(), including selectedRunError and detailRequestToken that were previously missing. The _taskOrchestrationPollTimer is correctly initialized to 0.


491-516: LGTM!

The beforeUnmount lifecycle hook correctly calls stopTaskOrchestrationPolling() to clean up the polling timer, preventing memory leaks and stale callbacks after component unmount.

web-ui/modules/app.methods.task-orchestration.mjs (9)

1-35: LGTM!

The createDefaultTaskOrchestrationState function provides a complete schema including selectedRunError and detailRequestToken, ensuring consistent state structure.


37-63: LGTM!

The helper functions normalizeLines, normalizePositiveInteger, normalizeTaskStatusTone, and isActiveStatus are well-implemented with proper input validation and clear return values.


69-82: LGTM!

The ensureTaskOrchestrationState method now correctly merges default keys into the existing state object, addressing the previously flagged schema mismatch issue.


127-178: LGTM!

The loadTaskOrchestrationOverview method correctly uses overviewWarnings instead of overwriting planWarnings, keeps the selected run in sync with available runs, and triggers detail refresh when appropriate.


180-213: LGTM!

The previewTaskPlan method now correctly clears plan, planIssues, and planWarnings in the catch block, preventing stale preview data from being displayed after a failed refresh.


271-294: LGTM!

The startTaskQueueRunner method now correctly differentiates the success message based on res.alreadyRunning, showing "队列执行器已在运行" when already running vs "队列执行器已启动" for fresh starts.


296-348: LGTM!

The loadTaskRunDetail method now correctly implements the request token pattern to guard against stale async responses. It clears detail on error, checks token before committing results, and properly manages loading state.


409-446: LGTM!

The polling logic correctly checks for live activity across queue items, selected run detail, and run list before scheduling the next poll. The 4-second interval and mainTab guard are appropriate for UI responsiveness without excessive API calls.


448-466: LGTM!

The resetTaskOrchestrationDraft method properly resets all draft-related fields to their default values and syncs polling state.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
cli.js (1)

5611-5618: ⚠️ Potential issue | 🟠 Major

listSessionUsage() still tops out at 300 sessions.

This path forwards limit into listSessionBrowse(), but that still routes through listAllSessions(), which clamps to MAX_SESSION_LIST_SIZE (300). The new MAX_SESSION_USAGE_LIST_SIZE = 2000 never actually takes effect, so usage views still miss anything past the first 300 sessions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 5611 - 5618, The usage view's limit
(MAX_SESSION_USAGE_LIST_SIZE) is ignored because listSessionBrowse forwards to
listAllSessions which clamps to MAX_SESSION_LIST_SIZE (300); update the call
chain so the larger limit is honored: add/propagate a numeric limit parameter
through listSessionBrowse to listAllSessions (or add a flag like
allowLargerLimit) and modify listAllSessions to use the provided limit
(respecting MAX_SESSION_USAGE_LIST_SIZE when appropriate) instead of always
clamping to MAX_SESSION_LIST_SIZE; reference the functions listSessionBrowse and
listAllSessions and the constants MAX_SESSION_USAGE_LIST_SIZE and
MAX_SESSION_LIST_SIZE when making the change.
♻️ Duplicate comments (8)
cli.js (7)

12447-12450: ⚠️ Potential issue | 🟠 Major

Preserve explicit CLI/API overrides when a plan object is supplied.

coerceTaskPlanPayload() returns the cloned plan immediately, so top-level overrides already parsed into params (allowWrite, dryRun, concurrency, autoFixRounds, etc.) are silently discarded. codexmate task run --plan @plan.json --dry-run will still execute with the file’s embedded settings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12447 - 12450, coerceTaskPlanPayload currently returns
cloneJson(params.plan) immediately which discards explicit top-level CLI/API
overrides (allowWrite, dryRun, concurrency, autoFixRounds, etc.); change it to
merge the cloned plan with the parsed top-level params so explicit overrides
win—e.g., clone the plan via cloneJson(params.plan, {}), then copy/assign the
known override fields from params into that clone (prefer param values when
defined) before returning; update function coerceTaskPlanPayload to perform this
merge so --plan `@file` plus flags behaves correctly.

12547-12554: ⚠️ Potential issue | 🟠 Major

The queue runner is still selecting newest queued work first.

listTaskQueueItems() sorts queued items by updatedAt descending, and startTaskQueueProcessing() takes queue[0]. That makes the queue LIFO: newly added or retried work jumps ahead of older queued tasks.

Also applies to: 13065-13068

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12547 - 12554, The current sorter in listTaskQueueItems
causes newest items to come first (LIFO) because it compares updatedAt with b
before a; change the comparison so older updatedAt values sort earlier (use
a.updatedAt compared to b.updatedAt) so queued items are processed FIFO when
startTaskQueueProcessing picks queue[0]; update both occurrences (the sorter in
listTaskQueueItems and the duplicated sorter around lines 13065-13068) and
ensure the statusRank tie-breaker behavior remains the same.

12765-12780: ⚠️ Potential issue | 🟠 Major

Buffer partial stdout/stderr lines before JSON parsing.

data events are chunked, not line-aligned. Splitting each chunk immediately means a JSON line spanning two chunks is lost from parsedEvents/sessionId and only survives as broken preview text.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12765 - 12780, captureLines currently splits each
incoming chunk immediately, losing JSON lines that span chunks; change it to
buffer partial lines across invocations: prepend any saved leftover string to
the incoming text, split on /\r?\n/, treat the last item as a potential
incomplete leftover (store it in a persistent leftover variable instead of
processing), and only iterate/parse the fully-terminated lines. While processing
full lines, keep the existing behavior (trim/filter, push truncated previews to
bucket using truncateTaskText, JSON.parse -> push to parsedEvents, and set
sessionId via findCodexSessionId). Ensure the leftover is preserved between
calls to captureLines so multi-chunk JSON lines are reconstructed before
parsing.

13083-13109: ⚠️ Potential issue | 🟠 Major

Serialize foreground queue starts too.

The singleton guard only applies when detach === true. Two non-detached callers can enter runner() concurrently, observe the same queued item, and execute it twice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 13083 - 13109, The foreground path needs the same
singleton guard as the detach path to prevent concurrent calls to runner() from
executing the same queued item twice: check and use the shared
g_taskQueueProcessor promise before starting runner() regardless of detach, e.g.
if g_taskQueueProcessor exists return {ok:true, started:false,
alreadyRunning:true}; otherwise set g_taskQueueProcessor = runner().catch(() =>
null).finally(()=>{ g_taskQueueProcessor = null; }); then if detach return the
detached response immediately, else await g_taskQueueProcessor for detail and
return the non-detached response; reference g_taskQueueProcessor, runner and the
detach branch when making the change.

13150-13161: ⚠️ Potential issue | 🟠 Major

task cancel <taskId> still misses detached non-queue runs.

g_taskRunControllers is keyed by runId, and this path only does a direct map lookup before falling back to persisted queue entries. Detached runs started outside the queue are never found by taskId here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 13150 - 13161, The cancel path currently only looks up
g_taskRunControllers by runId (using target) and then falls back to
getTaskQueueItem, so detached runs started outside the queue (where target is a
taskId) are missed; update the logic in the cancel handler to, when
controllerByRun is falsy, iterate g_taskRunControllers.values() and find a
controller whose taskId equals the target (or vice-versa/normalized) and use
that controller (abort it, return ok:true, cancelled:true and include
controller.runId and controller.taskId and mode:'running') before falling back
to getTaskQueueItem; reference g_taskRunControllers, controllerByRun, runId,
taskId and getTaskQueueItem to locate and implement this fix.

9953-9968: ⚠️ Potential issue | 🟠 Major

--detach still runs inside the current process.

These branches only stop awaiting the promise. The orchestration still runs on the parent Node process, so the command/API reports a detached job while the parent continues owning the work until its handles drain.

Also applies to: 11186-11202, 13083-13100

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 9953 - 9968, The detach branch currently just fires
runTaskPlanInternal(...) and doesn't actually background work; change it so that
when detach is true (the branch using coerceTaskPlanPayload,
validatePreparedTaskPlan, createTaskId, createTaskRunId) you do not call
runTaskPlanInternal in-process but instead spawn a detached child Node process
(e.g., child_process.fork/spawn) that receives the plan and identifiers (via
serialized temp file, environment variables, or IPC) with stdio set to detached
and the child unref'ed, return the detached job id to the caller, and
immediately exit/finish the parent handler; ensure validation still runs in the
parent and that errors are returned synchronously if validation fails.

11216-11229: ⚠️ Potential issue | 🟠 Major

Return a non-zero exit code when a retry still fails.

This branch only throws on result.error. retryTaskRun() returns a detail object on terminal failure, so a failed retry can still print a summary and exit 0, which is wrong for scripts and CI.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 11216 - 11229, The retry branch should return a non-zero
exit code when the retry ultimately failed; update the 'retry' subcommand
handling around the call to retryTaskRun so that after printing
JSON/summary/detached output you check for terminal failure (e.g. result.error
OR result.status === 'failed' OR result.success === false) and call
process.exit(1) in that case; modify the logic in the retry block that currently
only throws on result.error to also exit non‑zero for terminal failure
indicators returned by retryTaskRun (refer to retryTaskRun, result.error,
result.status, result.success, result.detached and printTaskRunSummary to locate
and update the code).
tests/unit/session-usage-backend.test.mjs (1)

387-390: ⚠️ Potential issue | 🟡 Minor

toIsoTime fallback is unreachable on invalid dates.

At Line 388 and Line 520, new Date(value).toISOString() throws RangeError for invalid input, so the fallback branch is never reached.

🐛 Proposed non-breaking fix
             toIsoTime(value, fallback = '') {
-                const iso = new Date(value).toISOString();
-                return iso === 'Invalid Date' ? fallback : iso;
+                const date = new Date(value);
+                if (!Number.isFinite(date.getTime())) return fallback;
+                return date.toISOString();
             },
Does `new Date('invalid').toISOString()` throw a `RangeError` in JavaScript, and what is the recommended guard before calling `toISOString()`?

Also applies to: 519-522

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/session-usage-backend.test.mjs` around lines 387 - 390, The
toIsoTime helper currently calls new Date(value).toISOString() which throws on
invalid dates, so replace the direct call with a guard: create a Date object
(e.g., const d = new Date(value)), check validity (d.getTime() isNaN) or use
Number.isNaN(d.getTime()) and return the fallback when invalid, otherwise return
d.toISOString(); apply the same change to the other duplicate toIsoTime
occurrence in the test file to ensure the fallback branch is reachable.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/config-tabs-ui.test.mjs`:
- Line 20: The test currently reads the source stylesheet via
readProjectFile('web-ui/styles/task-orchestration.css')
(taskOrchestrationStyles) but doesn't verify the final shipped bundle; update
the test to also load the bundled CSS using readBundledWebUiCss() and run the
same assertions against that string so the assertions at both this spot and the
block around lines 98-106 validate the actual bundled output as well as the
source file.

---

Outside diff comments:
In `@cli.js`:
- Around line 5611-5618: The usage view's limit (MAX_SESSION_USAGE_LIST_SIZE) is
ignored because listSessionBrowse forwards to listAllSessions which clamps to
MAX_SESSION_LIST_SIZE (300); update the call chain so the larger limit is
honored: add/propagate a numeric limit parameter through listSessionBrowse to
listAllSessions (or add a flag like allowLargerLimit) and modify listAllSessions
to use the provided limit (respecting MAX_SESSION_USAGE_LIST_SIZE when
appropriate) instead of always clamping to MAX_SESSION_LIST_SIZE; reference the
functions listSessionBrowse and listAllSessions and the constants
MAX_SESSION_USAGE_LIST_SIZE and MAX_SESSION_LIST_SIZE when making the change.

---

Duplicate comments:
In `@cli.js`:
- Around line 12447-12450: coerceTaskPlanPayload currently returns
cloneJson(params.plan) immediately which discards explicit top-level CLI/API
overrides (allowWrite, dryRun, concurrency, autoFixRounds, etc.); change it to
merge the cloned plan with the parsed top-level params so explicit overrides
win—e.g., clone the plan via cloneJson(params.plan, {}), then copy/assign the
known override fields from params into that clone (prefer param values when
defined) before returning; update function coerceTaskPlanPayload to perform this
merge so --plan `@file` plus flags behaves correctly.
- Around line 12547-12554: The current sorter in listTaskQueueItems causes
newest items to come first (LIFO) because it compares updatedAt with b before a;
change the comparison so older updatedAt values sort earlier (use a.updatedAt
compared to b.updatedAt) so queued items are processed FIFO when
startTaskQueueProcessing picks queue[0]; update both occurrences (the sorter in
listTaskQueueItems and the duplicated sorter around lines 13065-13068) and
ensure the statusRank tie-breaker behavior remains the same.
- Around line 12765-12780: captureLines currently splits each incoming chunk
immediately, losing JSON lines that span chunks; change it to buffer partial
lines across invocations: prepend any saved leftover string to the incoming
text, split on /\r?\n/, treat the last item as a potential incomplete leftover
(store it in a persistent leftover variable instead of processing), and only
iterate/parse the fully-terminated lines. While processing full lines, keep the
existing behavior (trim/filter, push truncated previews to bucket using
truncateTaskText, JSON.parse -> push to parsedEvents, and set sessionId via
findCodexSessionId). Ensure the leftover is preserved between calls to
captureLines so multi-chunk JSON lines are reconstructed before parsing.
- Around line 13083-13109: The foreground path needs the same singleton guard as
the detach path to prevent concurrent calls to runner() from executing the same
queued item twice: check and use the shared g_taskQueueProcessor promise before
starting runner() regardless of detach, e.g. if g_taskQueueProcessor exists
return {ok:true, started:false, alreadyRunning:true}; otherwise set
g_taskQueueProcessor = runner().catch(() => null).finally(()=>{
g_taskQueueProcessor = null; }); then if detach return the detached response
immediately, else await g_taskQueueProcessor for detail and return the
non-detached response; reference g_taskQueueProcessor, runner and the detach
branch when making the change.
- Around line 13150-13161: The cancel path currently only looks up
g_taskRunControllers by runId (using target) and then falls back to
getTaskQueueItem, so detached runs started outside the queue (where target is a
taskId) are missed; update the logic in the cancel handler to, when
controllerByRun is falsy, iterate g_taskRunControllers.values() and find a
controller whose taskId equals the target (or vice-versa/normalized) and use
that controller (abort it, return ok:true, cancelled:true and include
controller.runId and controller.taskId and mode:'running') before falling back
to getTaskQueueItem; reference g_taskRunControllers, controllerByRun, runId,
taskId and getTaskQueueItem to locate and implement this fix.
- Around line 9953-9968: The detach branch currently just fires
runTaskPlanInternal(...) and doesn't actually background work; change it so that
when detach is true (the branch using coerceTaskPlanPayload,
validatePreparedTaskPlan, createTaskId, createTaskRunId) you do not call
runTaskPlanInternal in-process but instead spawn a detached child Node process
(e.g., child_process.fork/spawn) that receives the plan and identifiers (via
serialized temp file, environment variables, or IPC) with stdio set to detached
and the child unref'ed, return the detached job id to the caller, and
immediately exit/finish the parent handler; ensure validation still runs in the
parent and that errors are returned synchronously if validation fails.
- Around line 11216-11229: The retry branch should return a non-zero exit code
when the retry ultimately failed; update the 'retry' subcommand handling around
the call to retryTaskRun so that after printing JSON/summary/detached output you
check for terminal failure (e.g. result.error OR result.status === 'failed' OR
result.success === false) and call process.exit(1) in that case; modify the
logic in the retry block that currently only throws on result.error to also exit
non‑zero for terminal failure indicators returned by retryTaskRun (refer to
retryTaskRun, result.error, result.status, result.success, result.detached and
printTaskRunSummary to locate and update the code).

In `@tests/unit/session-usage-backend.test.mjs`:
- Around line 387-390: The toIsoTime helper currently calls new
Date(value).toISOString() which throws on invalid dates, so replace the direct
call with a guard: create a Date object (e.g., const d = new Date(value)), check
validity (d.getTime() isNaN) or use Number.isNaN(d.getTime()) and return the
fallback when invalid, otherwise return d.toISOString(); apply the same change
to the other duplicate toIsoTime occurrence in the test file to ensure the
fallback branch is reachable.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 29feffd8-3016-4074-989b-c690ac7e7f49

📥 Commits

Reviewing files that changed from the base of the PR and between 43c8f14 and b3eba50.

📒 Files selected for processing (5)
  • cli.js
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/session-usage-backend.test.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (1)
  • web-ui/styles/task-orchestration.css
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 404-404: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (1)
tests/unit/session-usage-backend.test.mjs (1)

231-355: Strong coverage for model backfill and full-session model scanning.

These tests validate the intended behavior well (backfilling only missing metadata, preserving existing model fields, and keeping middle model names). Nice addition.

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! Triggering a fresh full review of PR #86 now.


🧠 Learnings used
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-05T03:58:26.787Z
Learning: For SakuraByteCore/codexmate reviews, only provide non-breaking feedback (correctness, safety, quality). Avoid API, behavior, or refactor suggestions. This rule applies across PRs `#55`, `#59`, `#67`, and `#68`.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (10)
cli.js (9)

12804-12820: ⚠️ Potential issue | 🟠 Major

Buffer partial stdout/stderr lines before JSON parsing.

data events are not line-aligned. Splitting each chunk immediately breaks JSON lines that span chunks, so parsedEvents and sessionId can be silently lost from task logs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12804 - 12820, The captureLines function currently
splits each incoming chunk immediately which breaks JSON lines spanning multiple
data events; change the logic to maintain per-stream buffers (e.g.,
stdoutBuffer/stderrBuffer) that append incoming chunk text, then split the
buffer on newline, process only the complete lines with the existing logic
(truncation, JSON.parse -> push into parsedEvents, set sessionId via
findCodexSessionId), and retain the final partial segment back in the buffer for
the next chunk; ensure captureLines is called only with complete lines extracted
from the buffer so truncateTaskText, parsedEvents and sessionId updates are
reliable.

12486-12489: ⚠️ Potential issue | 🟠 Major

Merge explicit CLI/API overrides into --plan payloads.

Returning params.plan directly here drops parsed top-level flags like allowWrite, dryRun, concurrency, and autoFixRounds, so codexmate task run --plan @plan.json --allow-write still executes with the file’s values instead of the explicit override.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12486 - 12489, coerceTaskPlanPayload currently returns a
clone of the plan file and thus ignores explicit top-level CLI/API overrides
(e.g., allowWrite, dryRun, concurrency, autoFixRounds); modify
coerceTaskPlanPayload to merge the provided params into the cloned plan so any
non-undefined top-level flags in params override values from the file — locate
function coerceTaskPlanPayload and ensure you clone params.plan (using
cloneJson) then assign/merge params.allowWrite, params.dryRun,
params.concurrency, params.autoFixRounds (and any other top-level flags) into
that cloned object before returning it so explicit CLI/API flags take
precedence.

10005-10007: ⚠️ Potential issue | 🟡 Minor

Detached runs still ignore caller-provided runId.

Both paths always call createTaskRunId(), so --run-id / API runId is discarded in detached mode.

Also applies to: 11231-11233

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 10005 - 10007, The detached-run path always overwrites
caller-provided runId by calling createTaskRunId(); change the runId
initialization to respect a provided non-empty string (e.g., const runId =
typeof params.runId === 'string' && params.runId.trim() ? params.runId.trim() :
createTaskRunId()) so runTaskPlanInternal(plan, { taskId, runId }) uses the
caller's runId when present; apply the same fix to the other detached-run call
site that also unconditionally calls createTaskRunId().

11255-11268: ⚠️ Potential issue | 🟠 Major

task retry still exits 0 after a failed rerun.

This branch only throws on result.error. A non-detached retry that finishes with result.run.status === 'failed' will still return success to scripts and CI.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 11255 - 11268, The retry branch only throws on
result.error, so a non-detached retry that completes with result.run.status ===
'failed' still exits 0; update the handler after printTaskRunSummary (inside the
subcommand === 'retry' block) to check result.run && result.run.status ===
'failed' (and not detached) and throw an Error (or rethrow) with a descriptive
message (e.g., include runId and status) so scripts/CI receive a non-zero exit;
locate this logic around the retryTaskRun call and result handling where
printTaskRunSummary is invoked and augment the existing result.error check to
also handle failed run statuses.

13104-13115: ⚠️ Potential issue | 🟠 Major

The queue runner still executes newest queued tasks first.

listTaskQueueItems({ status: 'queued' }) sorts by updatedAt descending, and queue[0] picks the newest item. Retries or recently added tasks can keep jumping ahead of older queued work.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 13104 - 13115, The runner is pulling the newest queued
item first because listTaskQueueItems({ status: 'queued' }) returns items sorted
by updatedAt descending and you pick queue[0]; change the selection to process
the oldest queued task first by either updating the listTaskQueueItems call to
request ascending order (e.g., include an order/sort option so it returns oldest
first) or, if changing the API is undesirable, pick the last element from queue
(queue[queue.length - 1]) instead of queue[0] when taskId is not provided; keep
the taskId lookup behavior for specific task runs and then call
runTaskPlanInternal(nextItem.plan, {...}) as before.

12975-13037: ⚠️ Potential issue | 🟠 Major

Persist a terminal failure state when executeTaskPlan throws.

This function only has try/finally. If executeTaskPlan(...) rejects, the controller is removed but the persisted run detail and queue item stay running, and detached callers swallow that rejection with .catch(() => {}).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12975 - 13037, The try/finally around executeTaskPlan
leaves persisted state as "running" if executeTaskPlan rejects; wrap the await
executeTaskPlan(...) call in a try/catch so that on rejection you build a
failure detail (use baseDetail, runId, toIsoTime(Date.now()), status 'failed' or
'cancelled' as appropriate, and include error message in run/error/summary),
call writeTaskRunDetail(detail),
appendTaskRunRecord(collectTaskRunSummary(detail)), and update the queue via
upsertTaskQueueItem(...) (mirroring the success path but setting
status/runStatus/lastSummary to reflect the error), then rethrow the error or
return the failure detail; retain the existing finally block that calls
g_taskRunControllers.delete(runId).

13189-13200: ⚠️ Potential issue | 🟠 Major

task cancel <taskId> still misses detached non-queue runs.

This lookup only checks g_taskRunControllers by map key, but the map is keyed by runId. Detached runs started outside the queue cannot be cancelled by taskId even though the help text says they can.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 13189 - 13200, The cancel logic only looks up
g_taskRunControllers by key (runId) so detached runs started outside the queue
cannot be found by taskId; change the lookup in the cancel path (around
controllerByRun and getTaskQueueItem) to also search
g_taskRunControllers.values() for a controller whose taskId matches the provided
target (or normalize target into runId/taskId first), then call
controller.abort() and return the same response; alternatively add a helper to
resolve target -> controller by checking both map key (runId) and iterating
controllers to match controller.taskId so detached runs can be cancelled by
taskId.

13091-13148: ⚠️ Potential issue | 🟠 Major

Foreground queue starts can still race each other.

Only detached starts are guarded by g_taskQueueProcessor. Two non-detached callers can enter runner() concurrently and execute the same queued task twice.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 13091 - 13148, The foreground startTaskQueueProcessing
path can race because only detached starts check g_taskQueueProcessor; ensure
both detach and non-detach callers serialize by setting/checking
g_taskQueueProcessor before invoking runner. Concretely: in
startTaskQueueProcessing, check if g_taskQueueProcessor exists and return
started:false when present; otherwise assign g_taskQueueProcessor =
runner().catch(()=>null).finally(()=>{ g_taskQueueProcessor = null }); then for
detach return the detached response immediately, and for non-detach await
g_taskQueueProcessor to get the detail and return it — keeping runner,
g_taskQueueProcessor, taskId, detach and the final cleanup logic intact.

9992-10015: ⚠️ Potential issue | 🟠 Major

--detach still keeps the work in the current process.

These branches only stop awaiting runTaskPlanInternal(...); they never spawn or unref a separate worker. The command reports a background run, but the work still lives in the current Node process.

Also applies to: 11225-11241

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 9992 - 10015, The current detach branch only stops
awaiting runTaskPlanInternal but still runs the task in-process; replace the
in-process fire-and-forget with a true detached worker: when detach is truthy
spawn a detached child (e.g., using child_process.fork or child_process.spawn
with { detached: true, stdio: 'ignore' } and child.unref()) or create a Worker
and call worker.unref(), and have that worker invoke runTaskPlanInternal with
the prepared plan and IDs; keep the same validation flow (coerceTaskPlanPayload,
validatePreparedTaskPlan) and pass taskId/runId (createTaskId, createTaskRunId)
to the detached process via args, env, IPC, or a temporary payload file, then
set result.started/detached true without executing runTaskPlanInternal in the
current process.
web-ui/logic.sessions.mjs (1)

169-174: ⚠️ Potential issue | 🟠 Major

Avoid spreading all day starts into Math.min / Math.max.

For range === 'all', large histories can exceed V8's argument limit here and break the usage tab. Please reduce iteratively instead.

🔧 Suggested fix
-        const validDayStarts = normalizedSessions
-            .map((session) => toUtcDayStartMs(session.updatedAtMs))
-            .filter((value) => Number.isFinite(value));
-        const firstDayStart = validDayStarts.length ? Math.min(...validDayStarts) : toUtcDayStartMs(now);
-        const lastDayStart = validDayStarts.length ? Math.max(...validDayStarts) : toUtcDayStartMs(now);
+        let firstDayStart = Number.POSITIVE_INFINITY;
+        let lastDayStart = Number.NEGATIVE_INFINITY;
+        for (const session of normalizedSessions) {
+            const dayStart = toUtcDayStartMs(session.updatedAtMs);
+            if (!Number.isFinite(dayStart)) continue;
+            if (dayStart < firstDayStart) firstDayStart = dayStart;
+            if (dayStart > lastDayStart) lastDayStart = dayStart;
+        }
+        if (!Number.isFinite(firstDayStart) || !Number.isFinite(lastDayStart)) {
+            firstDayStart = toUtcDayStartMs(now);
+            lastDayStart = firstDayStart;
+        }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/logic.sessions.mjs` around lines 169 - 174, The current block handling
range === 'all' computes firstDayStart and lastDayStart by spreading
validDayStarts into Math.min/Math.max which can hit V8 argument limits for large
normalizedSessions; replace the spread pattern by iterating/reducing
validDayStarts (e.g., use Array.prototype.reduce or a simple for loop) to
compute Math.min and Math.max safely, keeping the same fallback to
toUtcDayStartMs(now) when validDayStarts is empty (update the computation of
firstDayStart and lastDayStart in the same block where normalizedSessions and
toUtcDayStartMs are used).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/session-usage-backend.test.mjs`:
- Around line 299-355: The test fixture file is smaller than
SESSION_BROWSE_SUMMARY_READ_BYTES so a head-only read still contains the middle
model; expand the JSONL file written in the test (in the 'listSessionUsage scans
the full session file so middle model names are not dropped' case) to exceed
SESSION_BROWSE_SUMMARY_READ_BYTES (65536) by inserting enough extra
lines/records (e.g., many harmless JSON lines or padding) before the
'turn_context' and 'event_msg' entries so that a head-only read would miss the
middle model and force full-file scanning by
listSessionUsage/instantiateListSessionUsage.

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 47: The Workflow preset button currently sets taskOrchestration.target,
selectedEngine, workflowIdsText, and notes but does not clear leftover
follow-ups; update the click handler for the Workflow 批处理 button so it also sets
taskOrchestration.followUpsText = '' (reset to empty string) when clicked (look
for the button element with class "task-template-chip" whose click handler sets
taskOrchestration.selectedEngine = 'workflow' and workflowIdsText =
'diagnose-config\nsafe-provider-switch') to ensure no previous follow-up steps
are retained.

---

Duplicate comments:
In `@cli.js`:
- Around line 12804-12820: The captureLines function currently splits each
incoming chunk immediately which breaks JSON lines spanning multiple data
events; change the logic to maintain per-stream buffers (e.g.,
stdoutBuffer/stderrBuffer) that append incoming chunk text, then split the
buffer on newline, process only the complete lines with the existing logic
(truncation, JSON.parse -> push into parsedEvents, set sessionId via
findCodexSessionId), and retain the final partial segment back in the buffer for
the next chunk; ensure captureLines is called only with complete lines extracted
from the buffer so truncateTaskText, parsedEvents and sessionId updates are
reliable.
- Around line 12486-12489: coerceTaskPlanPayload currently returns a clone of
the plan file and thus ignores explicit top-level CLI/API overrides (e.g.,
allowWrite, dryRun, concurrency, autoFixRounds); modify coerceTaskPlanPayload to
merge the provided params into the cloned plan so any non-undefined top-level
flags in params override values from the file — locate function
coerceTaskPlanPayload and ensure you clone params.plan (using cloneJson) then
assign/merge params.allowWrite, params.dryRun, params.concurrency,
params.autoFixRounds (and any other top-level flags) into that cloned object
before returning it so explicit CLI/API flags take precedence.
- Around line 10005-10007: The detached-run path always overwrites
caller-provided runId by calling createTaskRunId(); change the runId
initialization to respect a provided non-empty string (e.g., const runId =
typeof params.runId === 'string' && params.runId.trim() ? params.runId.trim() :
createTaskRunId()) so runTaskPlanInternal(plan, { taskId, runId }) uses the
caller's runId when present; apply the same fix to the other detached-run call
site that also unconditionally calls createTaskRunId().
- Around line 11255-11268: The retry branch only throws on result.error, so a
non-detached retry that completes with result.run.status === 'failed' still
exits 0; update the handler after printTaskRunSummary (inside the subcommand ===
'retry' block) to check result.run && result.run.status === 'failed' (and not
detached) and throw an Error (or rethrow) with a descriptive message (e.g.,
include runId and status) so scripts/CI receive a non-zero exit; locate this
logic around the retryTaskRun call and result handling where printTaskRunSummary
is invoked and augment the existing result.error check to also handle failed run
statuses.
- Around line 13104-13115: The runner is pulling the newest queued item first
because listTaskQueueItems({ status: 'queued' }) returns items sorted by
updatedAt descending and you pick queue[0]; change the selection to process the
oldest queued task first by either updating the listTaskQueueItems call to
request ascending order (e.g., include an order/sort option so it returns oldest
first) or, if changing the API is undesirable, pick the last element from queue
(queue[queue.length - 1]) instead of queue[0] when taskId is not provided; keep
the taskId lookup behavior for specific task runs and then call
runTaskPlanInternal(nextItem.plan, {...}) as before.
- Around line 12975-13037: The try/finally around executeTaskPlan leaves
persisted state as "running" if executeTaskPlan rejects; wrap the await
executeTaskPlan(...) call in a try/catch so that on rejection you build a
failure detail (use baseDetail, runId, toIsoTime(Date.now()), status 'failed' or
'cancelled' as appropriate, and include error message in run/error/summary),
call writeTaskRunDetail(detail),
appendTaskRunRecord(collectTaskRunSummary(detail)), and update the queue via
upsertTaskQueueItem(...) (mirroring the success path but setting
status/runStatus/lastSummary to reflect the error), then rethrow the error or
return the failure detail; retain the existing finally block that calls
g_taskRunControllers.delete(runId).
- Around line 13189-13200: The cancel logic only looks up g_taskRunControllers
by key (runId) so detached runs started outside the queue cannot be found by
taskId; change the lookup in the cancel path (around controllerByRun and
getTaskQueueItem) to also search g_taskRunControllers.values() for a controller
whose taskId matches the provided target (or normalize target into runId/taskId
first), then call controller.abort() and return the same response; alternatively
add a helper to resolve target -> controller by checking both map key (runId)
and iterating controllers to match controller.taskId so detached runs can be
cancelled by taskId.
- Around line 13091-13148: The foreground startTaskQueueProcessing path can race
because only detached starts check g_taskQueueProcessor; ensure both detach and
non-detach callers serialize by setting/checking g_taskQueueProcessor before
invoking runner. Concretely: in startTaskQueueProcessing, check if
g_taskQueueProcessor exists and return started:false when present; otherwise
assign g_taskQueueProcessor = runner().catch(()=>null).finally(()=>{
g_taskQueueProcessor = null }); then for detach return the detached response
immediately, and for non-detach await g_taskQueueProcessor to get the detail and
return it — keeping runner, g_taskQueueProcessor, taskId, detach and the final
cleanup logic intact.
- Around line 9992-10015: The current detach branch only stops awaiting
runTaskPlanInternal but still runs the task in-process; replace the in-process
fire-and-forget with a true detached worker: when detach is truthy spawn a
detached child (e.g., using child_process.fork or child_process.spawn with {
detached: true, stdio: 'ignore' } and child.unref()) or create a Worker and call
worker.unref(), and have that worker invoke runTaskPlanInternal with the
prepared plan and IDs; keep the same validation flow (coerceTaskPlanPayload,
validatePreparedTaskPlan) and pass taskId/runId (createTaskId, createTaskRunId)
to the detached process via args, env, IPC, or a temporary payload file, then
set result.started/detached true without executing runTaskPlanInternal in the
current process.

In `@web-ui/logic.sessions.mjs`:
- Around line 169-174: The current block handling range === 'all' computes
firstDayStart and lastDayStart by spreading validDayStarts into
Math.min/Math.max which can hit V8 argument limits for large normalizedSessions;
replace the spread pattern by iterating/reducing validDayStarts (e.g., use
Array.prototype.reduce or a simple for loop) to compute Math.min and Math.max
safely, keeping the same fallback to toUtcDayStartMs(now) when validDayStarts is
empty (update the computation of firstDayStart and lastDayStart in the same
block where normalizedSessions and toUtcDayStartMs are used).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8feb694f-3151-443f-a09f-d180377a7904

📥 Commits

Reviewing files that changed from the base of the PR and between b3eba50 and 32421a3.

📒 Files selected for processing (9)
  • cli.js
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/session-usage-backend.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/logic.sessions.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/partials/index/panel-usage.html
  • web-ui/styles/sessions-usage.css
  • web-ui/styles/task-orchestration.css
✅ Files skipped from review due to trivial changes (2)
  • web-ui/partials/index/panel-usage.html
  • web-ui/styles/task-orchestration.css
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 389-389: Special characters must be escaped : [ > ].

(spec-char-escape)

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! Triggering a fresh full review of PR #86 now.


🧠 Learnings used
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-05T03:58:26.787Z
Learning: For SakuraByteCore/codexmate reviews, only provide non-breaking feedback (correctness, safety, quality). Avoid API, behavior, or refactor suggestions. This rule applies across PRs `#55`, `#59`, `#67`, and `#68`.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/cli-models-utils.js`:
- Line 46: The current predicate uses host.endsWith('bigmodel.cn') and
pathname.includes('/anthropic') which can match unintended hosts like
"notbigmodel.cn" or paths like "/anthropicx"; update the check to only match
true BigModel hosts and the exact Anthropic path segment (e.g., require host ===
'bigmodel.cn' or host.endsWith('.bigmodel.cn') and require the pathname to equal
'/anthropic' or match a segment boundary such as /^\/anthropic(?:\/|$)/). Locate
the return expression using host and pathname in lib/cli-models-utils.js and
replace the loose endsWith/includes checks with the stricter host
equality/subdomain check and the pathname equality/segment-boundary regex.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aa23511c-3935-47ab-8cd1-92cf17ef2a85

📥 Commits

Reviewing files that changed from the base of the PR and between 32421a3 and ba7697b.

📒 Files selected for processing (3)
  • cli.js
  • lib/cli-models-utils.js
  • tests/unit/cli-models-utils.test.mjs
✅ Files skipped from review due to trivial changes (1)
  • cli.js
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
🔇 Additional comments (2)
lib/cli-models-utils.js (1)

59-75: Merge behavior looks solid and predictable.

The de-dup + order strategy is correct: remote models are kept first, supplemental models are appended once, and IDs are normalized safely.

tests/unit/cli-models-utils.test.mjs (1)

13-29: Good coverage for the new catalog utilities.

These tests validate the critical behavior paths (BigModel supplemental selection and stable merge/de-dup ordering) and are well-targeted.

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! Triggering a fresh full review of PR #86 now.


🧠 Learnings used
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-05T03:58:26.787Z
Learning: For SakuraByteCore/codexmate reviews, only provide non-breaking feedback (correctness, safety, quality). Avoid API, behavior, or refactor suggestions. This rule applies across PRs `#55`, `#59`, `#67`, and `#68`.
✅ Actions performed

Full review triggered.

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! Triggering a fresh full review of PR #86 now.


🧠 Learnings used
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-05T03:58:26.787Z
Learning: For SakuraByteCore/codexmate reviews, only provide non-breaking feedback (correctness, safety, quality). Avoid API, behavior, or refactor suggestions. This rule applies across PRs `#55`, `#59`, `#67`, and `#68`.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

♻️ Duplicate comments (16)
lib/cli-models-utils.js (1)

46-46: ⚠️ Potential issue | 🟡 Minor

Tighten BigModel endpoint matching to avoid false positives.

Line 46 can match unintended hosts/paths, causing supplemental models to be applied where they shouldn’t.

🔧 Non-breaking fix
-        return host.endsWith('bigmodel.cn') && pathname.includes('/anthropic');
+        const isBigModelHost = host === 'bigmodel.cn' || host.endsWith('.bigmodel.cn');
+        const hasAnthropicSegment = /(^|\/)anthropic(\/|$)/.test(pathname);
+        return isBigModelHost && hasAnthropicSegment;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/cli-models-utils.js` at line 46, The current endpoint check using
host.endsWith('bigmodel.cn') && pathname.includes('/anthropic') is too
permissive; update the condition in lib/cli-models-utils.js (the code that
returns host.endsWith('bigmodel.cn') && pathname.includes('/anthropic')) to
require a stricter host and path match — e.g., match host with a boundary like
/\.?bigmodel\.cn$/ or host === 'bigmodel.cn' or ending with '.bigmodel.cn', and
match the pathname with a prefix/anchor such as
pathname.startsWith('/anthropic') or a regex like /^\/anthropic(\/|$)/ so only
exact BigModel Anthropic endpoints are detected. Ensure the updated condition
replaces the existing expression and preserves existing boolean logic.
tests/unit/session-usage-backend.test.mjs (1)

299-318: ⚠️ Potential issue | 🟡 Minor

This fixture still doesn't force a full-file scan.

SESSION_BROWSE_SUMMARY_READ_BYTES is 65536, but this JSONL is only a few hundred bytes. A head-only implementation would still see both model records, so the regression keeps passing even if full-file scanning is broken.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/session-usage-backend.test.mjs` around lines 299 - 318, The test
"listSessionUsage scans the full session file so middle model names are not
dropped" is too small to force a full-file scan because
SESSION_BROWSE_SUMMARY_READ_BYTES (the threshold used by listSessionUsage) is
much larger than the fixture; to fix, make the fixture exceed
SESSION_BROWSE_SUMMARY_READ_BYTES (or temporarily override that constant for the
test) so a head-only read would miss the middle record—e.g., insert a large
filler payload or repeat a filler line between the first and middle JSONL
entries so the file size grows past SESSION_BROWSE_SUMMARY_READ_BYTES and
listSessionUsage must scan the whole file to find "gpt-5.2-codex".
web-ui/partials/index/panel-orchestration.html (1)

47-47: ⚠️ Potential issue | 🟠 Major

Reset followUpsText in the Workflow preset too.

If the user clicks “修 review + 回归” first and then “Workflow 批处理”, the old follow-up lines are still kept in the draft and go out with the next preview/run.

🛠️ Suggested fix
-                                            <button type="button" class="task-template-chip" `@click`="taskOrchestration.target = '用 Workflow 跑一组固定检查并整理结果'; taskOrchestration.selectedEngine = 'workflow'; taskOrchestration.workflowIdsText = 'diagnose-config\nsafe-provider-switch'; taskOrchestration.notes = '输出统一结论,避免重复描述'">Workflow 批处理</button>
+                                            <button type="button" class="task-template-chip" `@click`="taskOrchestration.target = '用 Workflow 跑一组固定检查并整理结果'; taskOrchestration.selectedEngine = 'workflow'; taskOrchestration.workflowIdsText = 'diagnose-config\nsafe-provider-switch'; taskOrchestration.notes = '输出统一结论,避免重复描述'; taskOrchestration.followUpsText = ''">Workflow 批处理</button>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` at line 47, The Workflow
preset button's click handler doesn't clear previous follow-up drafts, so update
the inline click action tied to the "Workflow 批处理" button to reset
taskOrchestration.followUpsText (set to an empty string) along with the existing
assignments (target, selectedEngine, workflowIdsText, notes) so stale follow-up
lines are not carried over when switching presets.
web-ui/modules/app.computed.session.mjs (2)

227-264: ⚠️ Potential issue | 🟡 Minor

Coverage can still go above 100% when totalTokens is missing.

The denominator only counts session.totalTokens, but the numerator falls back to the per-field token breakdown. Sessions that have input/output counters but no totalTokens can therefore inflate coverage or show 100% against a zero denominator.

🛠️ Suggested fix
-        const pricing = resolveUsagePricingForSession(session, pricingIndex, currentProvider);
-        const totalSessionTokens = Number.isFinite(Number(session.totalTokens))
-            ? Math.max(0, Math.floor(Number(session.totalTokens)))
-            : 0;
-        totalTokens += totalSessionTokens;
-        if (!pricing) {
-            missingPricingSessions += 1;
-            continue;
-        }
         const inputTokens = Number.isFinite(Number(session.inputTokens)) ? Math.max(0, Math.floor(Number(session.inputTokens))) : null;
         const cachedInputTokens = Number.isFinite(Number(session.cachedInputTokens)) ? Math.max(0, Math.floor(Number(session.cachedInputTokens))) : 0;
         const outputTokens = Number.isFinite(Number(session.outputTokens)) ? Math.max(0, Math.floor(Number(session.outputTokens))) : null;
         const reasoningOutputTokens = Number.isFinite(Number(session.reasoningOutputTokens)) ? Math.max(0, Math.floor(Number(session.reasoningOutputTokens))) : 0;
+        const fallbackSessionTokens =
+            Math.max(0, (inputTokens || 0) - cachedInputTokens)
+            + cachedInputTokens
+            + (outputTokens || 0)
+            + reasoningOutputTokens;
+        const totalSessionTokens = Number.isFinite(Number(session.totalTokens))
+            ? Math.max(0, Math.floor(Number(session.totalTokens)))
+            : fallbackSessionTokens;
+        totalTokens += totalSessionTokens;
+        const pricing = resolveUsagePricingForSession(session, pricingIndex, currentProvider);
+        if (!pricing) {
+            missingPricingSessions += 1;
+            continue;
+        }
         if (inputTokens === null && outputTokens === null && reasoningOutputTokens === 0) {
             missingTokenSessions += 1;
             continue;
         }
         const billableInputTokens = Math.max(0, (inputTokens || 0) - cachedInputTokens);
@@
-        estimatedTokens += totalSessionTokens || (billableInputTokens + cachedInputTokens + (outputTokens || 0) + reasoningOutputTokens);
+        estimatedTokens += totalSessionTokens;
@@
-        ? Math.round((estimatedTokens / totalTokens) * 100)
+        ? Math.min(100, Math.round((estimatedTokens / totalTokens) * 100))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 227 - 264, The coverage
can exceed 100% because totalTokens only sums session.totalTokens while
estimatedTokens uses per-field fallbacks; to fix, compute a per-session total
fallback from the token breakdown and use it when session.totalTokens is
missing: in the loop around resolveUsagePricingForSession(...) compute
inputTokens, cachedInputTokens, outputTokens, reasoningOutputTokens first, then
set totalSessionTokens = Number.isFinite(Number(session.totalTokens)) ?
Math.max(0, Math.floor(Number(session.totalTokens))) : Math.max(0,
billableInputTokens + cachedInputTokens + (outputTokens || 0) +
reasoningOutputTokens) (ensure billableInputTokens is computed as Math.max(0,
(inputTokens||0) - cachedInputTokens)), then add that totalSessionTokens to
totalTokens and use the same breakdown for
estimatedTokens/estimatedSessions/totalCostUsd and coveragePercent calculation;
update references to totalSessionTokens, estimatedTokens and totalTokens
accordingly so denominator reflects the same token basis as the numerator.

163-170: ⚠️ Potential issue | 🟡 Minor

Don't drop models whose configured rate is $0.

readUsageCostNumber() already preserves zero as a valid rate, but this gate still requires > 0, so free/local models never show $0.00. The check also needs to look only at numeric rate fields, because pricing.source is always non-null.

🛠️ Suggested fix
-            const hasKnownRate = Object.values(pricing).some((value) => value !== null && value > 0);
+            const hasKnownRate = ['input', 'output', 'cacheRead', 'cacheWrite']
+                .some((field) => pricing[field] !== null);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 163 - 170, The current
gate drops models with a $0 rate because hasKnownRate uses value > 0 and also
picks up non-rate fields like pricing.source; update the hasKnownRate
computation (near createUsagePricingEntry/pricing and hasKnownRate) to detect
any numeric rate (including 0) by checking entries where typeof value ===
'number' (and/or key !== 'source') and accepting value === 0 or > 0 (i.e.,
typeof value === 'number'), so free/local models with a 0 rate are not filtered
out and non-numeric fields like pricing.source are ignored.
cli.js (8)

12815-12830: ⚠️ Potential issue | 🟠 Major

Buffer partial Codex output lines before JSON parsing.

stdout/stderr data events are not line-aligned. Splitting each chunk immediately means a JSON record that spans two chunks is recorded as two broken preview lines and never reaches parsedEvents or sessionId.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12815 - 12830, The captureLines function currently
splits each incoming chunk immediately, causing JSON records spanning chunks to
be broken; change it to buffer partial data per stream before splitting by
maintaining a leftover string (e.g., streamBuffer) that you append the incoming
text to, then split streamBuffer on /\r?\n/, keep the final element as the new
leftover (not yet parsed) and only process complete lines to push into bucket,
call truncateTaskText, JSON.parse into parsedEvents, and call findCodexSessionId
to set sessionId; preserve the existing length limits (bucket.length < 120,
parsedEvents.length < 120) and the try/catch around JSON.parse.

12497-12500: ⚠️ Potential issue | 🟠 Major

Honor explicit CLI/API overrides when a plan object is supplied.

This early return drops parsed top-level overrides like allowWrite, dryRun, concurrency, and autoFixRounds. task run --plan @plan.json --allow-write still executes with the file’s embedded values, so the override flags silently stop working whenever params.plan is present.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12497 - 12500, The function coerceTaskPlanPayload
currently returns immediately when params.plan is an object, which discards
explicit top-level override flags (e.g., allowWrite, dryRun, concurrency,
autoFixRounds); instead, clone the plan via cloneJson(params.plan, {}) and then
shallow-merge/assign the remaining top-level keys from params onto that cloned
object so explicit CLI/API overrides in params take precedence over embedded
plan values (i.e., merged = Object.assign(clonedPlan, pick(params,
['allowWrite','dryRun','concurrency','autoFixRounds',
...otherTopLevelOverrides]))), and return the merged object; keep the existing
Array.isArray check and behavior for non-object plans.

13200-13211: ⚠️ Potential issue | 🟠 Major

task cancel <taskId> still misses detached non-queue runs.

g_taskRunControllers is keyed by runId, but this branch only does get(target) before checking queued items. Detached task run --detach --task-id ... executions are never queued, so the advertised taskId|runId cancel path still fails for them.

💡 Minimal fix
-    const controllerByRun = g_taskRunControllers.get(target);
+    const controllerByRun = g_taskRunControllers.get(target)
+        || Array.from(g_taskRunControllers.values()).find((entry) => entry.taskId === target);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 13200 - 13211, The cancel branch only looks up
g_taskRunControllers.get(target) (assuming target is a runId) and therefore
misses detached runs keyed by runId when the user passes a taskId; update the
logic in the cancel handler to, after const controllerByRun =
g_taskRunControllers.get(target), if controllerByRun is falsy iterate
g_taskRunControllers.values() to find a controller whose taskId === target and
assign that to controllerByRun, then call controllerByRun.abort() and return the
same cancelled response; keep the existing fallback to getTaskQueueItem(target)
for queued items. Ensure you reference g_taskRunControllers, controllerByRun,
getTaskQueueItem and runId/taskId fields when making the change.

11236-11245: ⚠️ Potential issue | 🟠 Major

--detach still does not create a real background job.

These branches only stop awaiting the promise; they never move work into a detached child/worker. task run --detach, task queue start --detach, and task retry --detach still execute inside the current CLI process instead of becoming true background tasks.

Also applies to: 13133-13150, 13173-13184

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 11236 - 11245, The current --detach branches (where
options.detach is checked and code constructs plan via
coerceTaskPlanPayload/validatePreparedTaskPlan then calls runTaskPlanInternal)
only avoid awaiting the promise but still run work in-process; replace the
in-process call with spawning a true detached background worker: serialize the
plan and metadata (taskId/runId) and launch a detached child process (e.g., via
child_process.spawn with detached: true, stdio: 'ignore' and unref()) that
invokes the same CLI entrypoint or a dedicated worker entry to call
runTaskPlanInternal, so the parent can immediately return the detached result
object; apply the same change to the equivalent detach branches for task queue
start and task retry to ensure real background jobs are created instead of just
not awaiting the promise.

11266-11280: ⚠️ Potential issue | 🟠 Major

Return non-zero when a retry finishes failed.

This branch only throws on result.error. A retry can return a normal detail object with result.run.status === 'failed', and the CLI still exits 0 even though task run already treats the same terminal state as failure.

💡 Minimal fix
         if (result.error) {
             throw new Error(result.error);
         }
+        if (!result.detached && result.run && result.run.status && result.run.status !== 'success') {
+            throw new Error(result.run.error || 'task retry failed');
+        }
         return;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 11266 - 11280, The retry branch for subcommand 'retry'
currently only throws when result.error exists, but must also treat a finished
retry whose result.run.status === 'failed' as a CLI failure; after calling
retryTaskRun({ runId, detach: options.detach }) and handling
JSON/detached/output (printTaskRunSummary), check if result.error or (result.run
&& result.run.status === 'failed') and in that case throw a new Error (or
process.exit with non-zero) including context (runId/status) so the CLI returns
non-zero on failed retries; update the logic around retryTaskRun/result handling
to include this status check.

10003-10018: ⚠️ Potential issue | 🟡 Minor

Detached task run still ignores explicit runId.

Both detach branches hardcode createTaskRunId() instead of honoring params.runId / options.runId. That makes --run-id and API runId silently ineffective in the mode where callers need a stable handle to poll later.

Also applies to: 11236-11245

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 10003 - 10018, Detached task runs always call
createTaskRunId() instead of honoring provided runId, causing --run-id /
params.runId / options.runId to be ignored; update the detached branches that
call createTaskRunId() (the code around coerceTaskPlanPayload,
validatePreparedTaskPlan, and runTaskPlanInternal) to use a validated runId when
present: compute runId = (typeof params.runId === 'string' &&
params.runId.trim()) ? params.runId.trim() : ((options && typeof options.runId
=== 'string' && options.runId.trim()) ? options.runId.trim() :
createTaskRunId()); apply the same logic in both detach branches so callers can
pass an explicit runId to be used for polling later.

10018-10018: ⚠️ Potential issue | 🟠 Major

Persist a terminal failure state when task execution throws.

This function only records state on normal progress/return. If executeTaskPlan(...) rejects, finally removes the controller, but the run detail and queue item remain running; the detached call sites then swallow that rejection, so the stuck state becomes permanent.

Also applies to: 11244-11244, 12986-13048, 13141-13143, 13174-13177

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` at line 10018, The detached call currently swallows rejections from
runTaskPlanInternal (call: runTaskPlanInternal(plan, { taskId, runId }).catch(()
=> {})); instead, catch the error and persist a terminal failure state for the
run and its queue item (update the run detail and queue item to a failed/errored
status and store the error message/stack), using the same runId/taskId context,
then re-throw or log as appropriate; ensure the existing finally cleanup that
removes the controller still runs. Apply this change for runTaskPlanInternal
usages and the equivalent detached calls referenced (locations around startLines
~10018, 11244, 12986-13048, 13141-13143, 13174-13177) so rejected
executeTaskPlan/runTaskPlanInternal promises do not leave runs stuck in
"running" state.

12597-12603: ⚠️ Potential issue | 🟠 Major

Queue processing is still racy and can choose the wrong task.

Foreground callers bypass g_taskQueueProcessor, so two concurrent starts can execute the same queued item twice. The next item is also chosen from the newest-first top-200 queued slice, which makes default processing LIFO and can prevent an older explicit taskId from ever being picked.

Also applies to: 13102-13159

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cli.js` around lines 12597 - 12603, The queue selection/scheduling is racy
and LIFO: callers bypass g_taskQueueProcessor allowing concurrent starts to pick
the same item, and the sort on state.tasks makes newest-first (LIFO) using
updatedAt and a top-200 slice; fix by routing all starters through the single
g_taskQueueProcessor (or add a global mutex/lock) to serialize pick-and-start,
change the sort in the task selection logic (the sort using statusRank and
updatedAt) to prefer oldest-first (use ascending updatedAt/createdAt) so
explicit older taskId requests are reachable, remove or widen the arbitrary
top-200 slicing so selection considers the full queued set (or apply
deterministic pagination), and add an atomic claim/reserve step (check-and-set)
when picking a task to ensure two concurrent processors cannot start the same
task.
lib/task-orchestrator.js (1)

419-424: ⚠️ Potential issue | 🟡 Minor

Remove the duplicated empty-plan validation.

The same task-plan-nodes-required issue is pushed twice here, so callers receive duplicate error messages for a single problem.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/task-orchestrator.js` around lines 419 - 424, The code pushes the same
issue twice when nodes.length === 0, causing duplicate error messages; remove
the redundant conditional block so the 'task-plan-nodes-required' issue is only
pushed once (locate the duplicate if (nodes.length === 0) { issues.push({ code:
'task-plan-nodes-required', ... }) } and delete one occurrence) or alternatively
ensure the logic deduplicates before pushing to the issues array in the function
handling task plan validation.
tests/e2e/test-task-orchestration.js (1)

213-220: ⚠️ Potential issue | 🟡 Minor

Assert the cancel transition, not just request acceptance.

task-cancel returning { ok: true } only proves the request was accepted. This still passes if the queued item remains runnable; follow it with task-queue-show or task-overview and assert the task is actually marked canceled.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/test-task-orchestration.js` around lines 213 - 220, After calling
api('task-cancel') for the queued task (apiCancelQueued.task.taskId), don't only
assert apiCancel.ok; call the queue/status endpoint (e.g.,
api('task-queue-show', { taskId: apiCancelQueued.task.taskId }) or
api('task-overview', { taskId: ... })) and assert the returned task state shows
it is actually canceled (e.g., status/cancelled flag) so the test verifies the
transition rather than just request acceptance.
web-ui/logic.sessions.mjs (1)

169-174: ⚠️ Potential issue | 🟠 Major

Avoid spreading all day starts into Math.min / Math.max.

For range === 'all', long histories materialize every session day and spread the whole array into a single call. That can hit the engine's argument-count limit and take the usage panel down for large workspaces.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/logic.sessions.mjs` around lines 169 - 174, The current logic builds
validDayStarts from normalizedSessions and then spreads that array into
Math.min/Math.max (when computing firstDayStart/lastDayStart), which can blow up
for large arrays; change the computation to iterate once (e.g., use
Array.prototype.reduce or a simple loop over normalizedSessions calling
toUtcDayStartMs) to compute min and max without spreading, updating the derived
variables firstDayStart and lastDayStart accordingly while keeping the fallback
to toUtcDayStartMs(now) when no valid values exist.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/task-orchestrator.js`:
- Around line 680-695: The run is being finalized immediately after
signal.aborted while in-flight promises in active can still call finalizeNode()
and overwrite cancellation; to fix, after calling abortActiveNodes() wait for
all in-flight promises to settle (e.g., await Promise.allSettled(active)) before
mutating run.status and run.nodes, and additionally make finalizeNode() (and any
node completion handlers referenced by active) check run.status/nodeRun.status
and no-op or preserve a cancelled state if the run or node is already marked
cancelled so a late-resolving active promise cannot flip status back to success;
update the block around signal/abort handling and the finalizeNode() path to
enforce this ordering and guard.
- Around line 461-462: The scheduler is using raw node.dependsOn values so
normalized IDs (normalized by validateTaskPlan) aren’t matched; update the code
paths that build and use dependency lists to normalize entries immediately:
replace usages that read node.dependsOn with a normalized array created by
uniqueStringList(node.dependsOn || []).map(depId => normalizeId(depId, ''))
(same pattern as in the diff) and use that normalized dependsOn when populating
nodeMap and when iterating scheduling logic (e.g., where dependsOn is later
looped over at the other sites mentioned); ensure all references to dependsOn in
the scheduling and dependency-resolution code use this normalized array so
lookups against nodeMap succeed.
- Around line 278-301: The code that builds nodes for requested workflows sets
shouldBuildWorkflowNodes and then pushes follow-up nodes with kind 'codex',
causing validateTaskPlan() to reject plans when engine === 'workflow'; update
the node construction in the block using nextNodeId('workflow') (and the other
similar blocks that build follow-ups around the same logic) to ensure follow-up
nodes use kind: 'workflow' when engine === 'workflow' (or derive kind from the
current engine variable), and keep write/retryLimit/autoFixRounds and input
unchanged; locate the node creation sites (the loop that iterates
requestedWorkflowIds, the similar builders noted) and change the hardcoded
'codex' kind to be conditional so workflow-mode plans no longer self-invalidate.

In `@tests/unit/task-orchestrator.test.mjs`:
- Around line 101-127: The test allows a false positive because it doesn't
assert that nodes receive correct dependency results; update the test in
tests/unit/task-orchestrator.test.mjs to assert ctx.dependencyResults for
work-a, work-b and verify so that each node's dependencyResults contains the
expected ids (e.g., work-a and work-b should see ['inspect'], and verify should
see ['work-a','work-b']): locate the executeTaskPlan invocation and the
executeNode mock used there and add assertions against the captured callOrder or
directly assert the dependencyResults passed into executeNode for nodes with id
'work-a', 'work-b', and 'verify' to ensure prerequisites were actually provided.

In `@web-ui/logic.sessions.mjs`:
- Around line 412-426: The sort comparator in usedModels is introducing
tokenTotal and alphabetical tie-breakers (in the sort on modelMap.entries())
which changes ordering expected by tests; change the comparator to only sort by
count descending and return 0 for ties so the original insertion order from
modelMap is preserved (do not use tokenTotal or localeCompare as tie-breakers),
i.e., update the sort used by usedModels to compare b[1].count - a[1].count and
leave ties untouched so the Map's first-seen order remains.

In `@web-ui/modules/app.computed.main-tabs.mjs`:
- Around line 161-170: taskOrchestrationSelectedRunNodes currently only reads
nodes from detail.run.nodes and will drop tests that store nodes at the
top-level of the detail payload; update the logic in
taskOrchestrationSelectedRunNodes to accept nodes from either detail.nodes or
detail.run.nodes by checking detail first (e.g. const run = detail && typeof
detail === 'object' ? (Array.isArray(detail.nodes) ? { nodes: detail.nodes } :
detail.run || {}) : {}), then returning Array.isArray(run.nodes) ? run.nodes :
[]; ensure you reference taskOrchestrationSelectedRun and
taskOrchestrationSelectedRunNodes when making this change.

In `@web-ui/modules/app.methods.task-orchestration.mjs`:
- Around line 163-165: The background poll calls to loadTaskRunDetail are
forcing workspaceTab to "detail", so change loadTaskRunDetail to only set
this.workspaceTab = "detail" when an explicit flag allows it (e.g.,
options.switchToDetail === true) and treat the existing options.silent as
meaning "do not switch tabs"; update loadTaskRunDetail to check options.silent
or options.switchToDetail before mutating workspaceTab, and adjust the
polling/background callers (the calls at the shown locations that pass { silent:
true }) to keep passing silent so they won't trigger a tab switch while manual
selection calls continue to request the tab switch.
- Around line 136-159: The current code clears state.selectedRunId when the
returned runs slice doesn't include it, causing selections to vanish for
truncated responses; instead remove the block that resets
state.selectedRunId/state.selectedRunDetail/state.selectedRunError (the if that
checks !state.runs.some(...)) so we do not drop a valid selection just because
the overview is truncated; keep the logic that sets a default selectedRunId only
when it is falsy (the subsequent if that assigns state.runs[0].runId), and apply
the same change to the other occurrences (the similar blocks around
loadTaskRunDetail uses you noted at lines ~230-234 and ~367-370) so selection is
preserved and loadTaskRunDetail can still fetch details when needed.

In `@web-ui/partials/index/layout-header.html`:
- Around line 261-263: The header template now expects
mainTabKicker/mainTabTitle/mainTabSubtitle for tabs named config, sessions,
usage, orchestration, market, docs, and settings, but the computed mapping in
the module (the mainTabs computed export in app.computed.main-tabs.mjs) still
only contains keys for session/dashboard/task-orchestration; update that
computed mapping (the mainTabs object/return value) to include entries for each
of the new tab keys (config, sessions, usage, orchestration, market, docs,
settings) and provide appropriate kicker/title/subtitle strings (or fallbacks)
so the template variables resolve correctly; ensure the mapping keys exactly
match the tab identifiers used by the template.

In `@web-ui/partials/index/panel-usage.html`:
- Around line 86-155: The template expects different properties than the shape
produced by sessionUsageCharts (usedModels: {model, sessions, successRate,
totalCalls}; missingModelSessionsPreview: {sessionId, model}), so update the
template to read the actual fields: use item.model as the key/title,
item.sessions instead of item.count, item.totalCalls instead of
item.messageTotal, and guard any optional arrays like item.sourceLabels (use
(item.sourceLabels || []).length and (item.sourceLabels || []).join(' / ')); for
preview rows use item.sessionId and item.model and use item.model or
item.sessionId as the :key to avoid undefined keys. Ensure all places
referencing item.count, item.messageTotal, item.sourceLabels, item.title,
item.key are replaced or null-guarded to match the existing generator output.

In `@web-ui/session-helpers.mjs`:
- Around line 143-148: The call to this.loadTaskOrchestrationOverview(...) is
fire-and-forget and may reject; update the invocation so its returned promise is
handled (e.g., append a .catch handler or await it inside an async function and
wrap in try/catch) to prevent unhandled promise rejections. Locate the call to
loadTaskOrchestrationOverview in the conditional that checks
enteringOrchestrationTab and ensure any errors are logged/handled (reference:
loadTaskOrchestrationOverview) rather than left unhandled.

In `@web-ui/styles/sessions-usage.css`:
- Around line 132-165: The CSS file defines selectors like .usage-card-head,
.usage-card-title, .usage-card-subtitle and .usage-card-kicker but the template
uses .usage-diagnostic-header, .usage-diagnostic-message,
.usage-diagnostic-detail, and .usage-diagnostic-list-item (plus
.usage-diagnostic-row variants); update the stylesheet to either replace or add
matching selectors so the rules apply to the actual DOM: map the styles from
.usage-card-head → .usage-diagnostic-header, .usage-card-title →
.usage-diagnostic-message/title, .usage-card-subtitle →
.usage-diagnostic-detail, and ensure list/row styles apply to
.usage-diagnostic-list-item and .usage-diagnostic-row* (keeping the same layout,
spacing, colors, and border rules).

---

Duplicate comments:
In `@cli.js`:
- Around line 12815-12830: The captureLines function currently splits each
incoming chunk immediately, causing JSON records spanning chunks to be broken;
change it to buffer partial data per stream before splitting by maintaining a
leftover string (e.g., streamBuffer) that you append the incoming text to, then
split streamBuffer on /\r?\n/, keep the final element as the new leftover (not
yet parsed) and only process complete lines to push into bucket, call
truncateTaskText, JSON.parse into parsedEvents, and call findCodexSessionId to
set sessionId; preserve the existing length limits (bucket.length < 120,
parsedEvents.length < 120) and the try/catch around JSON.parse.
- Around line 12497-12500: The function coerceTaskPlanPayload currently returns
immediately when params.plan is an object, which discards explicit top-level
override flags (e.g., allowWrite, dryRun, concurrency, autoFixRounds); instead,
clone the plan via cloneJson(params.plan, {}) and then shallow-merge/assign the
remaining top-level keys from params onto that cloned object so explicit CLI/API
overrides in params take precedence over embedded plan values (i.e., merged =
Object.assign(clonedPlan, pick(params,
['allowWrite','dryRun','concurrency','autoFixRounds',
...otherTopLevelOverrides]))), and return the merged object; keep the existing
Array.isArray check and behavior for non-object plans.
- Around line 13200-13211: The cancel branch only looks up
g_taskRunControllers.get(target) (assuming target is a runId) and therefore
misses detached runs keyed by runId when the user passes a taskId; update the
logic in the cancel handler to, after const controllerByRun =
g_taskRunControllers.get(target), if controllerByRun is falsy iterate
g_taskRunControllers.values() to find a controller whose taskId === target and
assign that to controllerByRun, then call controllerByRun.abort() and return the
same cancelled response; keep the existing fallback to getTaskQueueItem(target)
for queued items. Ensure you reference g_taskRunControllers, controllerByRun,
getTaskQueueItem and runId/taskId fields when making the change.
- Around line 11236-11245: The current --detach branches (where options.detach
is checked and code constructs plan via
coerceTaskPlanPayload/validatePreparedTaskPlan then calls runTaskPlanInternal)
only avoid awaiting the promise but still run work in-process; replace the
in-process call with spawning a true detached background worker: serialize the
plan and metadata (taskId/runId) and launch a detached child process (e.g., via
child_process.spawn with detached: true, stdio: 'ignore' and unref()) that
invokes the same CLI entrypoint or a dedicated worker entry to call
runTaskPlanInternal, so the parent can immediately return the detached result
object; apply the same change to the equivalent detach branches for task queue
start and task retry to ensure real background jobs are created instead of just
not awaiting the promise.
- Around line 11266-11280: The retry branch for subcommand 'retry' currently
only throws when result.error exists, but must also treat a finished retry whose
result.run.status === 'failed' as a CLI failure; after calling retryTaskRun({
runId, detach: options.detach }) and handling JSON/detached/output
(printTaskRunSummary), check if result.error or (result.run && result.run.status
=== 'failed') and in that case throw a new Error (or process.exit with non-zero)
including context (runId/status) so the CLI returns non-zero on failed retries;
update the logic around retryTaskRun/result handling to include this status
check.
- Around line 10003-10018: Detached task runs always call createTaskRunId()
instead of honoring provided runId, causing --run-id / params.runId /
options.runId to be ignored; update the detached branches that call
createTaskRunId() (the code around coerceTaskPlanPayload,
validatePreparedTaskPlan, and runTaskPlanInternal) to use a validated runId when
present: compute runId = (typeof params.runId === 'string' &&
params.runId.trim()) ? params.runId.trim() : ((options && typeof options.runId
=== 'string' && options.runId.trim()) ? options.runId.trim() :
createTaskRunId()); apply the same logic in both detach branches so callers can
pass an explicit runId to be used for polling later.
- Line 10018: The detached call currently swallows rejections from
runTaskPlanInternal (call: runTaskPlanInternal(plan, { taskId, runId }).catch(()
=> {})); instead, catch the error and persist a terminal failure state for the
run and its queue item (update the run detail and queue item to a failed/errored
status and store the error message/stack), using the same runId/taskId context,
then re-throw or log as appropriate; ensure the existing finally cleanup that
removes the controller still runs. Apply this change for runTaskPlanInternal
usages and the equivalent detached calls referenced (locations around startLines
~10018, 11244, 12986-13048, 13141-13143, 13174-13177) so rejected
executeTaskPlan/runTaskPlanInternal promises do not leave runs stuck in
"running" state.
- Around line 12597-12603: The queue selection/scheduling is racy and LIFO:
callers bypass g_taskQueueProcessor allowing concurrent starts to pick the same
item, and the sort on state.tasks makes newest-first (LIFO) using updatedAt and
a top-200 slice; fix by routing all starters through the single
g_taskQueueProcessor (or add a global mutex/lock) to serialize pick-and-start,
change the sort in the task selection logic (the sort using statusRank and
updatedAt) to prefer oldest-first (use ascending updatedAt/createdAt) so
explicit older taskId requests are reachable, remove or widen the arbitrary
top-200 slicing so selection considers the full queued set (or apply
deterministic pagination), and add an atomic claim/reserve step (check-and-set)
when picking a task to ensure two concurrent processors cannot start the same
task.

In `@lib/cli-models-utils.js`:
- Line 46: The current endpoint check using host.endsWith('bigmodel.cn') &&
pathname.includes('/anthropic') is too permissive; update the condition in
lib/cli-models-utils.js (the code that returns host.endsWith('bigmodel.cn') &&
pathname.includes('/anthropic')) to require a stricter host and path match —
e.g., match host with a boundary like /\.?bigmodel\.cn$/ or host ===
'bigmodel.cn' or ending with '.bigmodel.cn', and match the pathname with a
prefix/anchor such as pathname.startsWith('/anthropic') or a regex like
/^\/anthropic(\/|$)/ so only exact BigModel Anthropic endpoints are detected.
Ensure the updated condition replaces the existing expression and preserves
existing boolean logic.

In `@lib/task-orchestrator.js`:
- Around line 419-424: The code pushes the same issue twice when nodes.length
=== 0, causing duplicate error messages; remove the redundant conditional block
so the 'task-plan-nodes-required' issue is only pushed once (locate the
duplicate if (nodes.length === 0) { issues.push({ code:
'task-plan-nodes-required', ... }) } and delete one occurrence) or alternatively
ensure the logic deduplicates before pushing to the issues array in the function
handling task plan validation.

In `@tests/e2e/test-task-orchestration.js`:
- Around line 213-220: After calling api('task-cancel') for the queued task
(apiCancelQueued.task.taskId), don't only assert apiCancel.ok; call the
queue/status endpoint (e.g., api('task-queue-show', { taskId:
apiCancelQueued.task.taskId }) or api('task-overview', { taskId: ... })) and
assert the returned task state shows it is actually canceled (e.g.,
status/cancelled flag) so the test verifies the transition rather than just
request acceptance.

In `@tests/unit/session-usage-backend.test.mjs`:
- Around line 299-318: The test "listSessionUsage scans the full session file so
middle model names are not dropped" is too small to force a full-file scan
because SESSION_BROWSE_SUMMARY_READ_BYTES (the threshold used by
listSessionUsage) is much larger than the fixture; to fix, make the fixture
exceed SESSION_BROWSE_SUMMARY_READ_BYTES (or temporarily override that constant
for the test) so a head-only read would miss the middle record—e.g., insert a
large filler payload or repeat a filler line between the first and middle JSONL
entries so the file size grows past SESSION_BROWSE_SUMMARY_READ_BYTES and
listSessionUsage must scan the whole file to find "gpt-5.2-codex".

In `@web-ui/logic.sessions.mjs`:
- Around line 169-174: The current logic builds validDayStarts from
normalizedSessions and then spreads that array into Math.min/Math.max (when
computing firstDayStart/lastDayStart), which can blow up for large arrays;
change the computation to iterate once (e.g., use Array.prototype.reduce or a
simple loop over normalizedSessions calling toUtcDayStartMs) to compute min and
max without spreading, updating the derived variables firstDayStart and
lastDayStart accordingly while keeping the fallback to toUtcDayStartMs(now) when
no valid values exist.

In `@web-ui/modules/app.computed.session.mjs`:
- Around line 227-264: The coverage can exceed 100% because totalTokens only
sums session.totalTokens while estimatedTokens uses per-field fallbacks; to fix,
compute a per-session total fallback from the token breakdown and use it when
session.totalTokens is missing: in the loop around
resolveUsagePricingForSession(...) compute inputTokens, cachedInputTokens,
outputTokens, reasoningOutputTokens first, then set totalSessionTokens =
Number.isFinite(Number(session.totalTokens)) ? Math.max(0,
Math.floor(Number(session.totalTokens))) : Math.max(0, billableInputTokens +
cachedInputTokens + (outputTokens || 0) + reasoningOutputTokens) (ensure
billableInputTokens is computed as Math.max(0, (inputTokens||0) -
cachedInputTokens)), then add that totalSessionTokens to totalTokens and use the
same breakdown for estimatedTokens/estimatedSessions/totalCostUsd and
coveragePercent calculation; update references to totalSessionTokens,
estimatedTokens and totalTokens accordingly so denominator reflects the same
token basis as the numerator.
- Around line 163-170: The current gate drops models with a $0 rate because
hasKnownRate uses value > 0 and also picks up non-rate fields like
pricing.source; update the hasKnownRate computation (near
createUsagePricingEntry/pricing and hasKnownRate) to detect any numeric rate
(including 0) by checking entries where typeof value === 'number' (and/or key
!== 'source') and accepting value === 0 or > 0 (i.e., typeof value ===
'number'), so free/local models with a 0 rate are not filtered out and
non-numeric fields like pricing.source are ignored.

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 47: The Workflow preset button's click handler doesn't clear previous
follow-up drafts, so update the inline click action tied to the "Workflow 批处理"
button to reset taskOrchestration.followUpsText (set to an empty string) along
with the existing assignments (target, selectedEngine, workflowIdsText, notes)
so stale follow-up lines are not carried over when switching presets.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 439d6f5b-6e94-4043-8328-27f510371c9d

📥 Commits

Reviewing files that changed from the base of the PR and between 0dc97ff and ba7697b.

📒 Files selected for processing (33)
  • cli.js
  • lib/cli-models-utils.js
  • lib/task-orchestrator.js
  • tests/e2e/run.js
  • tests/e2e/test-sessions.js
  • tests/e2e/test-setup.js
  • tests/e2e/test-task-orchestration.js
  • tests/unit/cli-models-utils.test.mjs
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/provider-share-command.test.mjs
  • tests/unit/run.mjs
  • tests/unit/session-usage-backend.test.mjs
  • tests/unit/session-usage.test.mjs
  • tests/unit/task-orchestrator.test.mjs
  • tests/unit/web-ui-behavior-parity.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/app.js
  • web-ui/index.html
  • web-ui/logic.sessions.mjs
  • web-ui/modules/app.computed.dashboard.mjs
  • web-ui/modules/app.computed.index.mjs
  • web-ui/modules/app.computed.main-tabs.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/modules/app.methods.index.mjs
  • web-ui/modules/app.methods.session-browser.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/layout-header.html
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/partials/index/panel-usage.html
  • web-ui/session-helpers.mjs
  • web-ui/styles.css
  • web-ui/styles/sessions-usage.css
  • web-ui/styles/task-orchestration.css
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 389-389: Special characters must be escaped : [ > ].

(spec-char-escape)

🔇 Additional comments (16)
web-ui/modules/app.methods.session-browser.mjs (1)

479-483: Good update for broader usage coverage.

Increasing the list-sessions-usage fetch limit here is consistent with the new “all-used models” visibility goal, and the surrounding loading/error handling remains intact.

tests/unit/provider-share-command.test.mjs (1)

1185-1200: Test coverage improvement is solid.

These assertions close an important gap by verifying models metadata is propagated in buildMcpProviderListPayload, not just provider editability flags.

Also applies to: 1219-1232

web-ui/styles.css (1)

10-10: Looks good.

The new stylesheet import cleanly wires orchestration styles into the existing bundle.

tests/unit/run.mjs (1)

17-17: LGTM.

Registering the orchestration unit suite in the runner is correct and keeps coverage aligned with the new subsystem.

web-ui/index.html (1)

20-20: Good integration point.

Adding the orchestration panel partial here is consistent with the new main-tab layout.

tests/e2e/run.js (1)

26-26: E2E wiring is correct.

The orchestration test is properly registered and executed in the main E2E sequence.

Also applies to: 138-138

web-ui/modules/app.computed.index.mjs (1)

2-2: Clean computed integration.

Importing and composing createMainTabsComputed() here matches the module’s aggregation pattern.

Also applies to: 11-11

web-ui/modules/app.computed.dashboard.mjs (1)

55-57: Busy-state extension looks correct.

The orchestration loading flags are safely gated and appropriately reflected in inspectorBusyStatus().

tests/e2e/test-sessions.js (1)

44-48: Provider-specific model assertions are correctly validated.

These assertions improve correctness by ensuring both Codex and Claude usage entries carry the expected model values.

tests/e2e/test-setup.js (1)

91-102: Fixture enrichment is consistent and test-oriented.

Adding explicit model metadata in both Codex and Claude fixtures makes the session-usage model assertions deterministic.

Also applies to: 198-199

tests/unit/session-usage.test.mjs (1)

13-23: Duration and range: 'all' coverage looks solid.

These test additions strengthen correctness checks for buildUsageChartGroups without altering runtime behavior.

Also applies to: 80-93

web-ui/modules/app.methods.index.mjs (1)

23-23: Task orchestration methods are integrated cleanly.

The import and spread composition follow the current app-method assembly pattern.

Also applies to: 85-86

web-ui/session-helpers.mjs (1)

149-151: Polling teardown guard on tab exit looks correct.

Stopping orchestration polling when leaving the tab is correctly scoped and protects against stale polling loops.

tests/unit/cli-models-utils.test.mjs (1)

13-29: New unit coverage for supplemental catalog and merge semantics is useful.

This validates both endpoint-specific model supplementation and catalog merge order/dedup rules.

lib/cli-models-utils.js (1)

59-75: Catalog merge and export wiring look good.

The dedup merge implementation and module exports are coherent with the new unit tests.

Also applies to: 383-385

tests/unit/web-ui-behavior-parity.test.mjs (1)

332-350: Parity allowlist updates are consistent with the new orchestration surface.

The added data/method/computed keys appear aligned with the new task orchestration feature set.

Also applies to: 411-431, 472-482

@awsl233777
Copy link
Copy Markdown
Collaborator Author

Added a local Claude model catalog fallback for external-credential configs without an API key, based on a competitor-style static catalog approach.\n\n- keep built-in Anthropic aliases available locally\n- append BigModel Claude-compatible extras for BigModel Anthropic endpoints\n- stop showing the config as unlimited when we can provide a deterministic local catalog\n\nValidation:\n- npm run test:unit -- claude-settings-sync.test.mjs web-ui-logic.test.mjs\n- git diff --check

@awsl233777
Copy link
Copy Markdown
Collaborator Author

@coderabbitai re-review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 12, 2026

@awsl233777 Sure! Triggering a fresh full review of PR #86 now.


🧠 Learnings used
Learnt from: SurviveM
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-29T02:42:32.785Z
Learning: The repository maintainer SurviveM (GitHub: SurviveM, repo: SakuraByteCore/codexmate) prefers that CodeRabbit review suggestions do not introduce breaking changes. Reviews should focus on correctness, safety, and quality without recommending refactors or changes that would break existing APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.

Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
✅ Actions performed

Full review triggered.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (2)
web-ui/partials/index/panel-orchestration.html (1)

389-389: ⚠️ Potential issue | 🟡 Minor

Escape the inline => in the log template.

Line 389 still contains a raw > inside the mustache expression. HTMLHint is already flagging this as spec-char-escape, so this fragment can keep failing static analysis even though the rendering logic itself is fine.

🔧 Minimal fix
-                                            <pre class="task-log-block">{{ Array.isArray(node.logs) && node.logs.length ? node.logs.map((item) => `${item.at || ''} ${item.level || ''} ${item.message || ''}`.trim()).join('\n') : '(no logs)' }}</pre>
+                                            <pre class="task-log-block">{{ Array.isArray(node.logs) && node.logs.length ? node.logs.map((item) =&gt; `${item.at || ''} ${item.level || ''} ${item.message || ''}`.trim()).join('\n') : '(no logs)' }}</pre>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/partials/index/panel-orchestration.html` at line 389, The mustache
expression in the <pre class="task-log-block"> contains a raw '>' in the arrow
function token "node.logs.map((item) => ...)" which triggers HTML
spec-char-escape warnings; replace the arrow token's '>' with an escaped entity
so the expression reads "node.logs.map((item) =&gt; ...)" (keep the rest of the
expression unchanged) to satisfy HTMLHint while preserving the logic referencing
node.logs.map and the task-log-block template.
web-ui/logic.sessions.mjs (1)

169-174: ⚠️ Potential issue | 🟠 Major

Avoid spreading all bucket candidates into Math.min / Math.max.

For range === 'all', Lines 170-174 still materialize and spread one argument per session into Math.min / Math.max. Large histories can hit the engine’s call-argument limit and break the usage panel again; a single pass keeps the same behavior without the runtime risk.

🔧 Suggested fix
     if (range === 'all') {
-        const validDayStarts = normalizedSessions
-            .map((session) => toUtcDayStartMs(session.updatedAtMs))
-            .filter((value) => Number.isFinite(value));
-        const firstDayStart = validDayStarts.length ? Math.min(...validDayStarts) : toUtcDayStartMs(now);
-        const lastDayStart = validDayStarts.length ? Math.max(...validDayStarts) : toUtcDayStartMs(now);
+        let firstDayStart = Number.POSITIVE_INFINITY;
+        let lastDayStart = Number.NEGATIVE_INFINITY;
+        for (const session of normalizedSessions) {
+            const dayStart = toUtcDayStartMs(session.updatedAtMs);
+            if (!Number.isFinite(dayStart)) continue;
+            if (dayStart < firstDayStart) firstDayStart = dayStart;
+            if (dayStart > lastDayStart) lastDayStart = dayStart;
+        }
+        if (!Number.isFinite(firstDayStart) || !Number.isFinite(lastDayStart)) {
+            firstDayStart = toUtcDayStartMs(now);
+            lastDayStart = firstDayStart;
+        }
         for (let stamp = firstDayStart; stamp <= lastDayStart; stamp += dayMs) {
             const key = formatUtcDayKey(stamp);
             buckets.push({
Do JavaScript engines have practical argument-count limits for calls like `Math.min(...largeArray)` / `Math.max(...largeArray)`, and is iterative reduction the recommended safe pattern for large arrays?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/logic.sessions.mjs` around lines 169 - 174, The current branch for
range === 'all' maps normalizedSessions to UTC day starts and then spreads the
array into Math.min/Math.max (via firstDayStart/lastDayStart), which risks
engine argument-count limits; replace that pipeline with a single-pass iteration
or reduce over normalizedSessions computing min and max of
toUtcDayStartMs(session.updatedAtMs) while skipping non-finite values
(Number.isFinite), and if no valid values found set firstDayStart and
lastDayStart to toUtcDayStartMs(now); update the logic that assigns
firstDayStart and lastDayStart to use these computed min/max values instead of
Math.min(...)/Math.max(...).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/task-orchestrator.js`:
- Around line 196-249: buildCodexNodePrompt currently drops request.notes so
Codex tasks lose the "说明" field; update buildCodexNodePrompt to include a
normalized notes block (e.g., const notes = normalizeText(context.notes, ...)
and a notesSection like `说明:\n${notes}`) and insert that notesSection into every
branch returned (kind === 'analysis', 'work', 'verify' and the default/follow-up
branch) similar to how dependencyBlock and followUp are used so the prompt
renders request.notes for all Codex prompt kinds.
- Around line 813-823: Race the current Promise.race over active tasks against
the abort signal so the loop can return immediately to the cancellation path
when cancellation wins: create (or reuse) an abortPromise from
registerAbort/AbortSignal and include it as an extra competitor in the
Promise.race that resolves to a sentinel like { aborted: true }; after awaiting
the race, check for that sentinel and if aborted, skip processing of the settled
node (don't delete or let wantsWrite clear writeLock for tasks that are still
active) and jump into the existing cancellation handling (the same path used by
registerAbort), otherwise continue processing the settled result as before;
ensure emitUpdate() is still called when appropriate and reference the existing
variables active, writeLock, registerAbort and function emitUpdate to locate
where to integrate the abort race.

In `@tests/unit/session-usage-backend.test.mjs`:
- Around line 231-297: The test hardcodes '/tmp/*.jsonl' paths that the injected
fs may probe before parser calls, making parser-call assertions flaky; update
the listSessionBrowse returned filePath values to use test-local nonexistent
paths (e.g. a unique temp/test subpath or a clearly non-existent path under the
repo) so listSessionUsage won't find them in the provided fs and will always
call parseCodexSessionSummary/parseClaudeSessionSummary; adjust the
'/tmp/codex-1.jsonl', '/tmp/claude-1.jsonl', and '/tmp/codex-2.jsonl'
occurrences in the test (and related assertions) accordingly to those new
test-local paths.

In `@web-ui/modules/app.computed.session.mjs`:
- Around line 478-482: The tooltip construction around estimatedCost.title
should distinguish three no-estimate cases: when totalSessions === 0, when
skippedUnsupportedSessions === totalSessions (only unsupported/Claude sessions),
and the real pricing-gap case; update the !estimatedCost.hasEstimate branch (the
ternary building the title string) to first check estimatedCost.totalSessions
=== 0 and return a message like "所选时间范围无会话记录" , then check
estimatedCost.skippedUnsupportedSessions === estimatedCost.totalSessions and
return a message like "所选范围仅包含目前被排除的 Claude 会话,暂不统计费用",otherwise keep the
existing fallback advising to fix models.cost or token splits; touch the same
expression that references estimatedCost.hasEstimate,
estimatedCost.totalSessions, and estimatedCost.skippedUnsupportedSessions to
implement these branches.
- Around line 260-287: The code currently double-counts reasoningOutputTokens
and allows pricing when either inputTokens or outputTokens is missing. Update
the logic so outputTokens and reasoningOutputTokens are disjoint: compute
nonReasoningOutput = Math.max(0, (outputTokens || 0) - reasoningOutputTokens)
and use nonReasoningOutput in fallbackSessionTokens and estimatedUsd instead of
(outputTokens || 0) where appropriate; keep reasoningOutputTokens billed via
(pricing.reasoningOutput ?? pricing.output). Also require a complete
input/output split before pricing by treating sessions with inputTokens === null
|| outputTokens === null as missingTokenSessions (increment missingTokenSessions
and continue) so partial splits are not priced; adjust billableInputTokens,
fallbackSessionTokens, totalSessionTokens, and estimatedUsd to use these
corrected values.

---

Duplicate comments:
In `@web-ui/logic.sessions.mjs`:
- Around line 169-174: The current branch for range === 'all' maps
normalizedSessions to UTC day starts and then spreads the array into
Math.min/Math.max (via firstDayStart/lastDayStart), which risks engine
argument-count limits; replace that pipeline with a single-pass iteration or
reduce over normalizedSessions computing min and max of
toUtcDayStartMs(session.updatedAtMs) while skipping non-finite values
(Number.isFinite), and if no valid values found set firstDayStart and
lastDayStart to toUtcDayStartMs(now); update the logic that assigns
firstDayStart and lastDayStart to use these computed min/max values instead of
Math.min(...)/Math.max(...).

In `@web-ui/partials/index/panel-orchestration.html`:
- Line 389: The mustache expression in the <pre class="task-log-block"> contains
a raw '>' in the arrow function token "node.logs.map((item) => ...)" which
triggers HTML spec-char-escape warnings; replace the arrow token's '>' with an
escaped entity so the expression reads "node.logs.map((item) =&gt; ...)" (keep
the rest of the expression unchanged) to satisfy HTMLHint while preserving the
logic referencing node.logs.map and the task-log-block template.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ebe1653c-3951-483b-aea8-fe97407f6e97

📥 Commits

Reviewing files that changed from the base of the PR and between 0266322 and 2bd785e.

📒 Files selected for processing (18)
  • README.md
  • cli.js
  • lib/cli-models-utils.js
  • lib/task-orchestrator.js
  • package.json
  • tests/e2e/test-task-orchestration.js
  • tests/unit/cli-models-utils.test.mjs
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/session-usage-backend.test.mjs
  • tests/unit/task-orchestrator.test.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/logic.sessions.mjs
  • web-ui/modules/app.computed.main-tabs.mjs
  • web-ui/modules/app.computed.session.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • web-ui/partials/index/panel-orchestration.html
  • web-ui/partials/index/panel-usage.html
  • web-ui/session-helpers.mjs
✅ Files skipped from review due to trivial changes (2)
  • package.json
  • tests/unit/cli-models-utils.test.mjs
🚧 Files skipped from review as they are similar to previous changes (7)
  • web-ui/session-helpers.mjs
  • lib/cli-models-utils.js
  • tests/unit/config-tabs-ui.test.mjs
  • tests/unit/task-orchestrator.test.mjs
  • web-ui/modules/app.methods.task-orchestration.mjs
  • tests/unit/web-ui-logic.test.mjs
  • web-ui/partials/index/panel-usage.html
📜 Review details
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-04T01:36:26.964Z
Learning: In PR `#68` for SakuraByteCore/codexmate, github-actions[bot] again flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`, `#68`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-04-03T10:42:22.368Z
Learning: In PR `#67` for SakuraByteCore/codexmate, github-actions[bot] flagged that CodeRabbit was again introducing breaking changes. The maintainer's absolute constraint (reinforced across PRs `#55`, `#59`, `#67`) is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-30T04:47:15.352Z
Learning: In PR `#55` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
Learnt from: github-actions[bot]
Repo: SakuraByteCore/codexmate PR: 0
File: :0-0
Timestamp: 2026-03-31T12:37:51.023Z
Learning: In PR `#59` for SakuraByteCore/codexmate, github-actions[bot] repeatedly flagged that CodeRabbit was introducing breaking changes. The maintainer's absolute constraint is: no breaking changes in any review suggestion; limit all feedback to correctness, safety, and quality only within the existing design, without refactoring or altering APIs, interfaces, or behavior.
🪛 HTMLHint (1.9.2)
web-ui/partials/index/panel-orchestration.html

[error] 1-1: Doctype must be declared before any non-comment content.

(doctype-first)


[error] 389-389: Special characters must be escaped : [ > ].

(spec-char-escape)

🪛 LanguageTool
README.md

[uncategorized] ~242-~242: 您的意思是“"不"回归”?
Context: ...次、队列、运行记录、节点日志。 适合这类场景: - 修 PR review 并补回归测试 - 先排查根因,再决定是否写代码 - 复用已有 workflow 跑固...

(BU)

🔇 Additional comments (2)
web-ui/modules/app.computed.main-tabs.mjs (1)

166-170: Nice payload-shape fallback for run nodes.

Checking detail.nodes before detail.run.nodes keeps the detail pane working for both response shapes stored by loadTaskRunDetail().

tests/e2e/test-task-orchestration.js (1)

219-222: Good cancellation assertion follow-up.

Fetching task-queue-show after task-cancel closes the gap where a no-op cancel could still return { ok: true }.

Comment on lines +196 to +249
function buildCodexNodePrompt(kind, context = {}) {
const title = normalizeText(context.title, 160);
const target = normalizeText(context.target, 4000);
const item = normalizeText(context.item, 1200);
const followUp = normalizeText(context.followUp, 1200);
const dependencySummaries = uniqueStringList(context.dependencySummaries || []).slice(0, 6);
const allowWrite = context.allowWrite === true;
const dependencyBlock = dependencySummaries.length > 0
? `\n已完成前置摘要:\n${dependencySummaries.map((entry, index) => `${index + 1}. ${entry}`).join('\n')}`
: '';
const writeRule = allowWrite
? '允许直接修改本地工作区,但必须控制变更范围并完成最小必要验证。'
: '只允许只读调查,不要修改任何文件,不要执行写入型操作。';

if (kind === 'analysis') {
return [
`任务标题: ${title || '任务分析'}`,
`任务目标:\n${target}`,
writeRule,
'请先调查当前仓库与上下文,给出简洁的现状判断、主要风险、建议执行顺序,以及你认为最小可行的验证方案。',
'输出请保持简洁,聚焦可执行信息。'
].join('\n\n');
}

if (kind === 'work') {
return [
`任务标题: ${title || '执行子任务'}`,
`总目标:\n${target}`,
`当前子任务:\n${item || title || target}`,
dependencyBlock,
writeRule,
'请只处理当前子任务范围,避免越界修改。完成后说明实际改动、验证结果与剩余风险。'
].join('\n\n');
}

if (kind === 'verify') {
return [
`任务标题: ${title || '验证与总结'}`,
`总目标:\n${target}`,
dependencyBlock,
writeRule,
'请基于当前工作区状态执行最小必要验证,检查明显回归,并输出结果摘要、未完成项和建议下一步。'
].join('\n\n');
}

return [
`任务标题: ${title || '后续任务'}`,
`总目标:\n${target}`,
dependencyBlock,
writeRule,
`追加指令:\n${followUp || item || target}`,
'请严格围绕上述后续指令继续推进,并给出结果摘要。'
].join('\n\n');
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Codex-mode plans currently discard notes.

buildTaskPlan() preserves request.notes, but the Codex prompt builder never receives or renders it. That makes the “说明” field a no-op for Codex tasks, so scope and validation constraints are silently dropped while workflow mode still preserves them via input.notes.

💡 Minimal fix
 function buildCodexNodePrompt(kind, context = {}) {
     const title = normalizeText(context.title, 160);
     const target = normalizeText(context.target, 4000);
+    const notes = normalizeText(context.notes, 2000);
     const item = normalizeText(context.item, 1200);
     const followUp = normalizeText(context.followUp, 1200);
     const dependencySummaries = uniqueStringList(context.dependencySummaries || []).slice(0, 6);
     const allowWrite = context.allowWrite === true;
@@
         return [
             `任务标题: ${title || '任务分析'}`,
             `任务目标:\n${target}`,
+            ...(notes ? [`补充约束:\n${notes}`] : []),
             writeRule,
             '请先调查当前仓库与上下文,给出简洁的现状判断、主要风险、建议执行顺序,以及你认为最小可行的验证方案。',
             '输出请保持简洁,聚焦可执行信息。'
         ].join('\n\n');
@@
             prompt: buildCodexNodePrompt('analysis', {
                 title,
                 target,
+                notes,
                 allowWrite: false
             }),
@@
                 prompt: buildCodexNodePrompt('work', {
                     title,
                     target,
+                    notes,
                     item,
                     allowWrite
                 }),
@@
             prompt: buildCodexNodePrompt('verify', {
                 title,
                 target,
+                notes,
                 allowWrite: false
             }),
@@
                 prompt: buildCodexNodePrompt('follow-up', {
                     title,
                     target,
+                    notes,
                     allowWrite,
                     followUp
                 }),

Apply the same notes block pattern to the other prompt branches (work, verify, and follow-up) as well.

Also applies to: 279-279, 317-380

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/task-orchestrator.js` around lines 196 - 249, buildCodexNodePrompt
currently drops request.notes so Codex tasks lose the "说明" field; update
buildCodexNodePrompt to include a normalized notes block (e.g., const notes =
normalizeText(context.notes, ...) and a notesSection like `说明:\n${notes}`) and
insert that notesSection into every branch returned (kind === 'analysis',
'work', 'verify' and the default/follow-up branch) similar to how
dependencyBlock and followUp are used so the prompt renders request.notes for
all Codex prompt kinds.

Comment on lines +813 to +823
if (active.size > 0) {
const settled = await Promise.race(Array.from(active.entries()).map(([nodeId, state]) => state.promise
.then((payload) => ({ nodeId, payload }))
.catch((error) => ({ nodeId, payload: { error: error && error.message ? error.message : String(error || 'task failed') } }))));
const settledState = active.get(settled.nodeId);
active.delete(settled.nodeId);
if (settledState && settledState.wantsWrite) {
writeLock = false;
}
await emitUpdate();
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Abort cannot interrupt the active wait.

Once execution reaches await Promise.race(...) at Lines 814-816, the loop will not get back to the cancellation block at Lines 693-713 until some node settles. Nodes that only stop through registerAbort() therefore keep running — and can keep writing — after cancellation has already been requested.

Please race the active wait itself against the abort signal and jump back to the cancellation path immediately when abort wins.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/task-orchestrator.js` around lines 813 - 823, Race the current
Promise.race over active tasks against the abort signal so the loop can return
immediately to the cancellation path when cancellation wins: create (or reuse)
an abortPromise from registerAbort/AbortSignal and include it as an extra
competitor in the Promise.race that resolves to a sentinel like { aborted: true
}; after awaiting the race, check for that sentinel and if aborted, skip
processing of the settled node (don't delete or let wantsWrite clear writeLock
for tasks that are still active) and jump into the existing cancellation
handling (the same path used by registerAbort), otherwise continue processing
the settled result as before; ensure emitUpdate() is still called when
appropriate and reference the existing variables active, writeLock,
registerAbort and function emitUpdate to locate where to integrate the abort
race.

Comment on lines +231 to +297
test('listSessionUsage backfills missing model metadata from parsed session summaries', async () => {
const codexParses = [];
const claudeParses = [];
const listSessionUsage = instantiateListSessionUsage({
fs,
MAX_SESSION_USAGE_LIST_SIZE: 2000,
SESSION_BROWSE_SUMMARY_READ_BYTES: 65536,
async listSessionBrowse() {
return [
{
source: 'codex',
sessionId: 'codex-1',
filePath: '/tmp/codex-1.jsonl',
provider: 'maxx'
},
{
source: 'claude',
sessionId: 'claude-1',
filePath: '/tmp/claude-1.jsonl'
},
{
source: 'codex',
sessionId: 'codex-2',
filePath: '/tmp/codex-2.jsonl',
model: 'already-there'
}
];
},
parseCodexSessionSummary(filePath, options) {
codexParses.push({ filePath, options });
return filePath === '/tmp/codex-1.jsonl'
? { model: 'gpt-5.3-codex', models: ['gpt-5.3-codex', 'gpt-5.2-codex'], provider: 'maxx' }
: null;
},
parseClaudeSessionSummary(filePath, options) {
claudeParses.push({ filePath, options });
return filePath === '/tmp/claude-1.jsonl'
? { model: 'claude-3-7-sonnet', models: ['claude-3-7-sonnet'], provider: 'claude' }
: null;
},
listAllSessionsData: async () => {
throw new Error('should not call listAllSessionsData');
}
});

const result = await listSessionUsage({ source: 'all', limit: 50, forceRefresh: true });

assert.strictEqual(result[0].model, 'gpt-5.3-codex');
assert.deepStrictEqual(result[0].models, ['gpt-5.3-codex', 'gpt-5.2-codex']);
assert.strictEqual(result[0].provider, 'maxx');
assert.strictEqual(result[1].model, 'claude-3-7-sonnet');
assert.deepStrictEqual(result[1].models, ['claude-3-7-sonnet']);
assert.strictEqual(result[1].provider, 'claude');
assert.strictEqual(result[2].model, 'already-there');
assert.deepStrictEqual(codexParses, [
{
filePath: '/tmp/codex-1.jsonl',
options: { summaryReadBytes: 65536, titleReadBytes: 65536 }
}
]);
assert.deepStrictEqual(claudeParses, [
{
filePath: '/tmp/claude-1.jsonl',
options: { summaryReadBytes: 65536, titleReadBytes: 65536 }
}
]);
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use test-local nonexistent paths for this parser-backfill case.

Line 243 and Line 249 hardcode /tmp/*.jsonl while this test now passes the real fs into listSessionUsage. That function probes filePath before it calls the summary parsers, so an unrelated file at either path can change the control flow and make the parser-call assertions flaky.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/session-usage-backend.test.mjs` around lines 231 - 297, The test
hardcodes '/tmp/*.jsonl' paths that the injected fs may probe before parser
calls, making parser-call assertions flaky; update the listSessionBrowse
returned filePath values to use test-local nonexistent paths (e.g. a unique
temp/test subpath or a clearly non-existent path under the repo) so
listSessionUsage won't find them in the provided fs and will always call
parseCodexSessionSummary/parseClaudeSessionSummary; adjust the
'/tmp/codex-1.jsonl', '/tmp/claude-1.jsonl', and '/tmp/codex-2.jsonl'
occurrences in the test (and related assertions) accordingly to those new
test-local paths.

Comment on lines +260 to +287
const inputTokens = Number.isFinite(Number(session.inputTokens)) ? Math.max(0, Math.floor(Number(session.inputTokens))) : null;
const cachedInputTokens = Number.isFinite(Number(session.cachedInputTokens)) ? Math.max(0, Math.floor(Number(session.cachedInputTokens))) : 0;
const outputTokens = Number.isFinite(Number(session.outputTokens)) ? Math.max(0, Math.floor(Number(session.outputTokens))) : null;
const reasoningOutputTokens = Number.isFinite(Number(session.reasoningOutputTokens)) ? Math.max(0, Math.floor(Number(session.reasoningOutputTokens))) : 0;
const billableInputTokens = Math.max(0, (inputTokens || 0) - cachedInputTokens);
const fallbackSessionTokens = billableInputTokens + cachedInputTokens + (outputTokens || 0) + reasoningOutputTokens;
const totalSessionTokens = Number.isFinite(Number(session.totalTokens))
? Math.max(0, Math.floor(Number(session.totalTokens)))
: fallbackSessionTokens;
totalTokens += totalSessionTokens;
const pricing = resolveUsagePricingForSession(session, pricingIndex, currentProvider);
if (!pricing) {
missingPricingSessions += 1;
continue;
}
if (inputTokens === null && outputTokens === null && reasoningOutputTokens === 0) {
missingTokenSessions += 1;
continue;
}
const estimatedUsd = (
((pricing.input || 0) * billableInputTokens)
+ ((pricing.cacheRead || 0) * cachedInputTokens)
+ (((pricing.reasoningOutput ?? pricing.output) || 0) * reasoningOutputTokens)
+ ((pricing.output || 0) * (outputTokens || 0))
) / 1000000;
totalCostUsd += estimatedUsd;
estimatedSessions += 1;
estimatedTokens += totalSessionTokens;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep reasoning tokens disjoint from outputTokens, and require a complete split before pricing.

tests/unit/session-usage-backend.test.mjs Lines 494-500 already show total_tokens = input_tokens + output_tokens while reasoning_output_tokens is also present, so reasoning is part of outputTokens here. The current formula adds reasoningOutputTokens once in fallbackSessionTokens and a second time in estimatedUsd; on top of that, a session with only one side of the input/output split still gets priced with the missing side as 0. That skews both the dollar estimate and coveragePercent.

💸 Suggested fix
         const inputTokens = Number.isFinite(Number(session.inputTokens)) ? Math.max(0, Math.floor(Number(session.inputTokens))) : null;
         const cachedInputTokens = Number.isFinite(Number(session.cachedInputTokens)) ? Math.max(0, Math.floor(Number(session.cachedInputTokens))) : 0;
         const outputTokens = Number.isFinite(Number(session.outputTokens)) ? Math.max(0, Math.floor(Number(session.outputTokens))) : null;
         const reasoningOutputTokens = Number.isFinite(Number(session.reasoningOutputTokens)) ? Math.max(0, Math.floor(Number(session.reasoningOutputTokens))) : 0;
-        const billableInputTokens = Math.max(0, (inputTokens || 0) - cachedInputTokens);
-        const fallbackSessionTokens = billableInputTokens + cachedInputTokens + (outputTokens || 0) + reasoningOutputTokens;
+        const hasCompleteTokenSplit = inputTokens !== null && outputTokens !== null;
+        const billableInputTokens = hasCompleteTokenSplit ? Math.max(0, inputTokens - cachedInputTokens) : 0;
+        const reasoningTokens = hasCompleteTokenSplit ? Math.min(reasoningOutputTokens, outputTokens) : 0;
+        const nonReasoningOutputTokens = hasCompleteTokenSplit ? Math.max(0, outputTokens - reasoningTokens) : 0;
+        const fallbackSessionTokens = hasCompleteTokenSplit
+            ? billableInputTokens + cachedInputTokens + outputTokens
+            : 0;
         const totalSessionTokens = Number.isFinite(Number(session.totalTokens))
             ? Math.max(0, Math.floor(Number(session.totalTokens)))
             : fallbackSessionTokens;
         totalTokens += totalSessionTokens;
         const pricing = resolveUsagePricingForSession(session, pricingIndex, currentProvider);
         if (!pricing) {
             missingPricingSessions += 1;
             continue;
         }
-        if (inputTokens === null && outputTokens === null && reasoningOutputTokens === 0) {
+        if (!hasCompleteTokenSplit) {
             missingTokenSessions += 1;
             continue;
         }
         const estimatedUsd = (
             ((pricing.input || 0) * billableInputTokens)
             + ((pricing.cacheRead || 0) * cachedInputTokens)
-            + (((pricing.reasoningOutput ?? pricing.output) || 0) * reasoningOutputTokens)
-            + ((pricing.output || 0) * (outputTokens || 0))
+            + (((pricing.reasoningOutput ?? pricing.output) || 0) * reasoningTokens)
+            + ((pricing.output || 0) * nonReasoningOutputTokens)
         ) / 1000000;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 260 - 287, The code
currently double-counts reasoningOutputTokens and allows pricing when either
inputTokens or outputTokens is missing. Update the logic so outputTokens and
reasoningOutputTokens are disjoint: compute nonReasoningOutput = Math.max(0,
(outputTokens || 0) - reasoningOutputTokens) and use nonReasoningOutput in
fallbackSessionTokens and estimatedUsd instead of (outputTokens || 0) where
appropriate; keep reasoningOutputTokens billed via (pricing.reasoningOutput ??
pricing.output). Also require a complete input/output split before pricing by
treating sessions with inputTokens === null || outputTokens === null as
missingTokenSessions (increment missingTokenSessions and continue) so partial
splits are not priced; adjust billableInputTokens, fallbackSessionTokens,
totalSessionTokens, and estimatedUsd to use these corrected values.

Comment on lines +478 to +482
title: estimatedCost.hasEstimate
? `${estimatedCost.skippedUnsupportedSessions > 0 ? '暂不含 Claude,' : ''}${estimatedCost.catalogSessions > 0
? (estimatedCost.configuredSessions > 0 ? '按已配置单价 + 公开模型目录估算' : '按公开模型目录估算')
: '按已配置单价估算'},估算 ${formatUsageEstimatedCost(estimatedCost.totalCostUsd, { precise: true })},覆盖 ${estimatedCost.estimatedSessions}/${estimatedCost.totalSessions} 个会话,约 ${estimatedCost.coveragePercent}% token`
: `${estimatedCost.skippedUnsupportedSessions > 0 ? '暂不统计 Claude 费用。' : ''}缺少可匹配的模型单价或 token 拆分。请先补 models.cost,或确认会话已记录 input/output token。`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Differentiate empty / unsupported ranges from real pricing gaps.

Lines 478-482 use the same fallback title for every !hasEstimate case. When the selected range is empty, or it only contains currently excluded Claude sessions, the tooltip still tells users to fix models.cost or token splits even though nothing is wrong with their data.

📝 Suggested fix
                 title: estimatedCost.hasEstimate
                     ? `${estimatedCost.skippedUnsupportedSessions > 0 ? '暂不含 Claude,' : ''}${estimatedCost.catalogSessions > 0
                         ? (estimatedCost.configuredSessions > 0 ? '按已配置单价 + 公开模型目录估算' : '按公开模型目录估算')
                         : '按已配置单价估算'},估算 ${formatUsageEstimatedCost(estimatedCost.totalCostUsd, { precise: true })},覆盖 ${estimatedCost.estimatedSessions}/${estimatedCost.totalSessions} 个会话,约 ${estimatedCost.coveragePercent}% token`
-                    : `${estimatedCost.skippedUnsupportedSessions > 0 ? '暂不统计 Claude 费用。' : ''}缺少可匹配的模型单价或 token 拆分。请先补 models.cost,或确认会话已记录 input/output token。`
+                    : estimatedCost.totalSessions === 0 && estimatedCost.skippedUnsupportedSessions === 0
+                        ? '当前范围内暂无可估算会话。'
+                        : estimatedCost.totalSessions === 0 && estimatedCost.skippedUnsupportedSessions > 0
+                            ? '当前范围内仅包含暂不统计的 Claude 会话。'
+                            : `${estimatedCost.skippedUnsupportedSessions > 0 ? '暂不统计 Claude 费用。' : ''}缺少可匹配的模型单价或 token 拆分。请先补 models.cost,或确认会话已记录 input/output token。`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@web-ui/modules/app.computed.session.mjs` around lines 478 - 482, The tooltip
construction around estimatedCost.title should distinguish three no-estimate
cases: when totalSessions === 0, when skippedUnsupportedSessions ===
totalSessions (only unsupported/Claude sessions), and the real pricing-gap case;
update the !estimatedCost.hasEstimate branch (the ternary building the title
string) to first check estimatedCost.totalSessions === 0 and return a message
like "所选时间范围无会话记录" , then check estimatedCost.skippedUnsupportedSessions ===
estimatedCost.totalSessions and return a message like "所选范围仅包含目前被排除的 Claude
会话,暂不统计费用",otherwise keep the existing fallback advising to fix models.cost or
token splits; touch the same expression that references
estimatedCost.hasEstimate, estimatedCost.totalSessions, and
estimatedCost.skippedUnsupportedSessions to implement these branches.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant