fix: use latest assistant token count on resume instead of stale compression checkpoint#3109
fix: use latest assistant token count on resume instead of stale compression checkpoint#3109tanzhenxin wants to merge 2 commits intomainfrom
Conversation
…ression checkpoint When resuming a session that had /compress followed by more messages, getResumePromptTokenCount would return the compression checkpoint's newTokenCount instead of the more recent assistant message's totalTokenCount. This caused the status line to show a stale context usage value until the first new API call. Fixes #3107
E2E Test Report
|
📋 Review SummaryThis PR fixes a subtle but important bug in 🔍 General Feedback
🎯 Specific Feedback🔵 Low
✅ Highlights
|
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…uard Restructure to return early for both branches (assistant usage and compression checkpoint) instead of accumulating a fallback. Skip zero/placeholder assistant usage so it doesn't override a valid compression checkpoint. Add tests for the two key scenarios.
TLDR
One-line fix: when resuming a session that had
/compressfollowed by more messages, the status line showed a stale context usage value from the compression checkpoint instead of the actual last API call's token count.getResumePromptTokenCountiterates backwards through messages and captures the most recent assistant record's token count asfallback, but then unconditionally returns the compression checkpoint'snewTokenCountwhen it finds one — discarding the correct value. The fix isreturn fallback ?? payload.info.newTokenCount, so a more recent assistant record takes precedence.Screenshots / Video Demo
N/A — the status line percentage difference is small (e.g., 8.0% vs 8.2%) and hard to capture meaningfully in a screenshot. See E2E verification below.
Dive Deeper
The backwards loop in
getResumePromptTokenCountwas written with the assumption that the compression checkpoint is always the most authoritative source. But after compression, subsequent API calls produce assistant records with their owntotalTokenCountthat reflects the actual context size — which may differ from the compression checkpoint.Reviewer Test Plan
qwen --approval-mode yolo/compress, wait for it to finish/exit, note the session IDqwen --resume <id>— the status line context usage should match the last API call's token count, not the compression checkpointTesting Matrix
Linked issues / bugs
Fixes #3107