fix(#1863): preserve user messages in trimMessages to prevent session derailment by aheritier · Pull Request #1864 · docker/cagent

aheritier · 2026-02-26T22:25:41Z

trimMessages() removed the oldest conversation messages first without protecting user messages. In single-turn agentic loops (one user message followed by many tool calls), the user's original request was the first message trimmed, causing the model to lose context and stop working.

User messages are now marked as protected and skipped during trimming. Only assistant and tool messages are eligible for removal.

Fixes #1863

Assisted-By: cagent

…ession derailment trimMessages() removed the oldest conversation messages first without protecting user messages. In single-turn agentic loops (one user message followed by many tool calls), the user's original request was the first message trimmed, causing the model to lose context and stop working. User messages are now marked as protected and skipped during trimming. Only assistant and tool messages are eligible for removal. Fixes docker#1863 Assisted-By: cagent

aheritier · 2026-02-26T22:25:57Z

/review

docker-agent

Review Summary

This PR successfully fixes issue #1863 by protecting user messages from being trimmed during message history management. The implementation is well-designed and thoroughly tested.

Key Changes:

User messages are now marked as protected and excluded from trimming
The algorithm correctly prioritizes user message preservation over strict message count limits
Comprehensive test coverage validates the protection logic across various scenarios

Analysis:
✅ The logic correctly identifies and protects user messages using an index-based map
✅ Tool call consistency is maintained when assistant messages are removed
✅ Edge cases are well-covered, including agentic loops with many tool calls
✅ Tests explicitly validate that user messages are preserved even when maxItems would be exceeded
✅ This is intentional behavior to prevent context loss - user messages take priority

No issues found. This change will prevent the model from losing track of user requests in long agentic loops, which was the root cause of issue #1863.

docker-agent

Review Summary

The implementation looks solid! The fix correctly addresses the issue where user messages were being trimmed in agentic loops. The protection mechanism is well-implemented:

✅ User messages are properly identified and marked as protected
✅ The removal algorithm correctly skips protected messages while maintaining oldest-first trimming for non-protected messages
✅ Tool call consistency logic remains intact (orphaned tool results are properly cleaned up)
✅ Comprehensive test coverage added for the new behavior

The code correctly handles edge cases where protected messages dominate the conversation history, and the final result reconstruction properly preserves message ordering and tool call relationships.

No critical issues found. This PR is ready to merge!

Copilot

Pull request overview

This PR updates session history trimming to avoid dropping user messages in long tool-heavy agentic loops (fixing #1863), so the model retains the original user request and doesn’t derail mid-session.

Changes:

Updated trimMessages() to treat user messages as protected during trimming and to cascade-remove tool results for removed assistant tool calls.
Revised existing trimming/history-limit tests to reflect the new “user messages preserved” behavior.
Added new tests that simulate long single-turn agentic loops and validate tool-call/result consistency while preserving user requests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`pkg/session/session.go`	Adjusts trimming logic to skip user messages and avoid orphaned tool results.
`pkg/session/session_test.go`	Updates trimming test expectations around protected user messages/tool-call consistency.
`pkg/session/session_history_test.go`	Reworks history-limit expectations and adds coverage for long agentic-loop scenarios.

Comments suppressed due to low confidence (2)

pkg/session/session_test.go:71

The updated test comment says the result “includes [both user messages] plus the most recent assistant/tool pair”, but with maxItems := 3 and the current trimming logic, it’s possible for the assistant message of that most recent pair to be trimmed (since user messages still contribute to toRemove). This test also no longer asserts the trimmed message set/length, so it may pass even if the most recent tool interaction is lost. Recommend adding assertions that the expected assistant/tool pair is actually present (and that older tool interactions are removed as intended) so the test detects regressions.

	// Both user messages are protected, so result includes them plus the most recent assistant/tool pair
	toolCalls := make(map[string]bool)
	for _, msg := range result {
		if msg.Role == chat.MessageRoleAssistant {
			for _, tool := range msg.ToolCalls {
				toolCalls[tool.ID] = true
			}
		}
		if msg.Role == chat.MessageRoleTool {
			assert.True(t, toolCalls[msg.ToolCallID], "tool result should have corresponding assistant message")
		}

pkg/session/session.go:762

trimMessages currently removes the oldest non-protected messages until len(removed) == toRemove, but if toRemove is derived from total conversation length (including protected user messages), the function can end up removing recent assistant messages even when the remaining overage is only due to protected users. This can produce a history containing only user messages (no assistant/tool context) and undermines num_history_items as an effective cap. A more robust approach is to cap only assistant/tool messages (or otherwise ensure the kept set always includes the most recent tool interaction(s) while preserving user messages).

	// Calculate how many conversation messages we need to remove
	toRemove := len(conversationMessages) - maxItems

	// Mark the oldest non-protected messages for removal
	removed := make(map[int]bool)
	for i := 0; i < len(conversationMessages) && len(removed) < toRemove; i++ {
		if protected[i] {
			continue
		}
		removed[i] = true
		if conversationMessages[i].Role == chat.MessageRoleAssistant {
			for _, toolCall := range conversationMessages[i].ToolCalls {
				toolCallsToRemove[toolCall.ID] = true
			}
		}
	}

	// Combine system messages with trimmed conversation messages
	result := make([]chat.Message, 0, len(systemMessages)+maxItems)

	// Add all system messages first
	result = append(result, systemMessages...)

	// Add protected and non-removed conversation messages
	for i, msg := range conversationMessages {
		if removed[i] {
			continue
		}

		// Skip orphaned tool results whose assistant message was removed
		if msg.Role == chat.MessageRoleTool && toolCallsToRemove[msg.ToolCallID] {
			continue
		}

		result = append(result, msg)
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T22:29:27Z

pkg/session/session_history_test.go

+			name:            "limit to 3 conversation messages — user messages protected",
+			numHistoryItems: 3,
+			messageCount:    10,
+			// 10 user (all protected) + 10 assistant. Need to remove 17, but only 10 removable.
+			// Result: 10 users + 0 assistants = 10
+			expectedConversationMsgs: 10,
 		},
 		{
-			name:                     "limit to 5 conversation messages",
-			numHistoryItems:          5,
-			messageCount:             8,
-			expectedConversationMsgs: 5, // Limited to 5 out of 16 total messages
+			name:            "limit to 5 conversation messages — user messages protected",
+			numHistoryItems: 5,
+			messageCount:    8,
+			// 8 user (all protected) + 8 assistant. Need to remove 11, but only 8 removable.
+			// Result: 8 users + 0 assistants = 8
+			expectedConversationMsgs: 8,


These expectations effectively encode that num_history_items is no longer a hard limit on conversation history (e.g., numHistoryItems: 3 yields expectedConversationMsgs: 10), which can significantly increase the prompt size and cost for multi-turn chats. If the intended behavior is “preserve user messages but still keep at most N assistant/tool messages”, the tests should reflect that (e.g., expected conversation count would be userCount + numHistoryItems, subject to tool-call consistency). As written, the tests also allow trimming away all assistant context while keeping many user messages, which is likely to confuse the model.

Copilot · 2026-02-26T22:29:27Z

pkg/session/session_history_test.go

+	// 8 conversation messages: 4 user + 4 assistant
+	// User messages are always protected, so only assistant messages can be trimmed.
 	testCases := []struct {
 		limit                int
-		expectedTotal        int
-		expectedConversation int
 		expectedSystem       int
+		expectedUser         int
+		expectedConversation int // total non-system
 	}{
-		{limit: 2, expectedTotal: 3, expectedConversation: 2, expectedSystem: 1},
-		{limit: 4, expectedTotal: 5, expectedConversation: 4, expectedSystem: 1},
-		{limit: 8, expectedTotal: 9, expectedConversation: 8, expectedSystem: 1},
-		{limit: 100, expectedTotal: 9, expectedConversation: 8, expectedSystem: 1},
+		// limit=2: need to remove 6 of 8, but 4 are protected users → only 4 assistants removable → remove 4
+		{limit: 2, expectedSystem: 1, expectedUser: 4, expectedConversation: 4},
+		// limit=4: need to remove 4 of 8, 4 are protected → remove all 4 assistants
+		{limit: 4, expectedSystem: 1, expectedUser: 4, expectedConversation: 4},
+		// limit=8: no trimming needed (8 <= 8)
+		{limit: 8, expectedSystem: 1, expectedUser: 4, expectedConversation: 8},
+		// limit=100: no trimming needed
+		{limit: 100, expectedSystem: 1, expectedUser: 4, expectedConversation: 8},
 	}


TestTrimMessagesConversationLimit currently expects that for small limits (e.g. 2 or 4) all assistant messages are removed and only the protected user messages remain, which means the returned history can contain multiple user turns with no assistant replies. If the goal is to prevent losing the user request while still sending a coherent recent conversation window, consider adjusting the trimming strategy/tests to retain some recent assistant/tool context (and treat the limit as applying to non-user messages), rather than allowing “user-only” histories.

Copilot · 2026-02-26T22:29:28Z

pkg/session/session.go

@@ -710,15 +712,27 @@ func trimMessages(messages []chat.Message, maxItems int) []chat.Message {
 		return messages
 	}

+	// Identify user message indices — these are protected from trimming
+	protected := make(map[int]bool)
+	for i, msg := range conversationMessages {
+		if msg.Role == chat.MessageRoleUser {
+			protected[i] = true
+		}
+	}
+
 	// Keep track of tool call IDs that need to be removed
 	toolCallsToRemove := make(map[string]bool)

 	// Calculate how many conversation messages we need to remove
 	toRemove := len(conversationMessages) - maxItems

-	// Start from the beginning (oldest messages)
-	for i := range toRemove {
-		// If this is an assistant message with tool calls, mark them for removal
+	// Mark the oldest non-protected messages for removal
+	removed := make(map[int]bool)
+	for i := 0; i < len(conversationMessages) && len(removed) < toRemove; i++ {
+		if protected[i] {
+			continue
+		}
+		removed[i] = true
 		if conversationMessages[i].Role == chat.MessageRoleAssistant {
 			for _, toolCall := range conversationMessages[i].ToolCalls {


The doc comment says user messages are “not counted against the limit”, but the implementation still computes toRemove := len(conversationMessages) - maxItems, which implicitly counts user messages toward the limit. This leads to trimming assistant/tool messages more aggressively than intended when there are protected user messages (and can even drop the most recent assistant/tool interaction just to satisfy the count). Consider defining the limit in terms of removable messages (assistant/tool) and computing toRemove from that count, or update the comment to match the actual semantics.

aheritier · 2026-02-26T22:36:28Z

To challenge with #1831

docker-agent

Review Summary

This PR successfully addresses issue #1863 by protecting user messages from being trimmed in agentic loops. The implementation is sound:

✅ Core Logic: User messages are correctly identified and marked as protected before the trimming loop begins
✅ Trimming Algorithm: The removal loop properly skips protected messages and continues until the required number of non-protected messages are removed
✅ Tool Call Consistency: Tool results from removed assistant messages are correctly orphaned and skipped
✅ Comprehensive Tests: Three new test cases validate the user message protection behavior in various scenarios

The changed code correctly implements the intended behavior and includes thorough test coverage. No bugs found.

aheritier · 2026-02-27T10:59:03Z

I am not convinced by all copilot comments. The last one could make sense, others ... not sure

Copilot AI review requested due to automatic review settings February 26, 2026 22:25

aheritier requested a review from a team as a code owner February 26, 2026 22:25

Copilot started reviewing on behalf of aheritier February 26, 2026 22:26 View session

docker-agent bot approved these changes Feb 26, 2026

View reviewed changes

rumpl approved these changes Feb 26, 2026

View reviewed changes

Copilot AI reviewed Feb 26, 2026

View reviewed changes

aheritier mentioned this pull request Feb 26, 2026

trimMessages drops user messages in long agentic loops, causing session derailment #1863

Closed

aheritier marked this pull request as draft February 26, 2026 22:36

aheritier self-assigned this Feb 27, 2026

aheritier added kind/bug Something isn't working area/agent For work that has to do with the general agent loop/agentic features of the app labels Feb 27, 2026

aheritier marked this pull request as ready for review February 27, 2026 10:43

docker-agent bot approved these changes Feb 27, 2026

View reviewed changes

dgageot merged commit 923ba05 into docker:main Feb 27, 2026
15 checks passed

aheritier mentioned this pull request Feb 28, 2026

fix(#1644): enforce Anthropic tool sequencing and filter sub-agent messages #1880

Draft

BrewTestBot mentioned this pull request Mar 3, 2026

cagent 1.28.0 Homebrew/homebrew-core#270329

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(#1863): preserve user messages in trimMessages to prevent session derailment#1864

fix(#1863): preserve user messages in trimMessages to prevent session derailment#1864
dgageot merged 1 commit intodocker:mainfrom
aheritier:fix/trim-messages-preserve-user-messages

aheritier commented Feb 26, 2026

Uh oh!

aheritier commented Feb 26, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

docker-agent bot left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

aheritier commented Feb 26, 2026

Uh oh!

docker-agent bot left a comment

Uh oh!

aheritier commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aheritier commented Feb 26, 2026

Uh oh!

aheritier commented Feb 26, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

aheritier commented Feb 26, 2026

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Uh oh!

aheritier commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants