Skip to content

fix(#1863): preserve user messages in trimMessages to prevent session derailment#1864

Merged
dgageot merged 1 commit intodocker:mainfrom
aheritier:fix/trim-messages-preserve-user-messages
Feb 27, 2026
Merged

fix(#1863): preserve user messages in trimMessages to prevent session derailment#1864
dgageot merged 1 commit intodocker:mainfrom
aheritier:fix/trim-messages-preserve-user-messages

Conversation

@aheritier
Copy link
Contributor

trimMessages() removed the oldest conversation messages first without protecting user messages. In single-turn agentic loops (one user message followed by many tool calls), the user's original request was the first message trimmed, causing the model to lose context and stop working.

User messages are now marked as protected and skipped during trimming. Only assistant and tool messages are eligible for removal.

Fixes #1863

Assisted-By: cagent

…ession derailment

trimMessages() removed the oldest conversation messages first without
protecting user messages. In single-turn agentic loops (one user message
followed by many tool calls), the user's original request was the first
message trimmed, causing the model to lose context and stop working.

User messages are now marked as protected and skipped during trimming.
Only assistant and tool messages are eligible for removal.

Fixes docker#1863

Assisted-By: cagent
Copilot AI review requested due to automatic review settings February 26, 2026 22:25
@aheritier aheritier requested a review from a team as a code owner February 26, 2026 22:25
@aheritier
Copy link
Contributor Author

/review

Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR successfully fixes issue #1863 by protecting user messages from being trimmed during message history management. The implementation is well-designed and thoroughly tested.

Key Changes:

  • User messages are now marked as protected and excluded from trimming
  • The algorithm correctly prioritizes user message preservation over strict message count limits
  • Comprehensive test coverage validates the protection logic across various scenarios

Analysis:
✅ The logic correctly identifies and protects user messages using an index-based map
✅ Tool call consistency is maintained when assistant messages are removed
✅ Edge cases are well-covered, including agentic loops with many tool calls
✅ Tests explicitly validate that user messages are preserved even when maxItems would be exceeded
✅ This is intentional behavior to prevent context loss - user messages take priority

No issues found. This change will prevent the model from losing track of user requests in long agentic loops, which was the root cause of issue #1863.

Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

The implementation looks solid! The fix correctly addresses the issue where user messages were being trimmed in agentic loops. The protection mechanism is well-implemented:

✅ User messages are properly identified and marked as protected
✅ The removal algorithm correctly skips protected messages while maintaining oldest-first trimming for non-protected messages
✅ Tool call consistency logic remains intact (orphaned tool results are properly cleaned up)
✅ Comprehensive test coverage added for the new behavior

The code correctly handles edge cases where protected messages dominate the conversation history, and the final result reconstruction properly preserves message ordering and tool call relationships.

No critical issues found. This PR is ready to merge!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates session history trimming to avoid dropping user messages in long tool-heavy agentic loops (fixing #1863), so the model retains the original user request and doesn’t derail mid-session.

Changes:

  • Updated trimMessages() to treat user messages as protected during trimming and to cascade-remove tool results for removed assistant tool calls.
  • Revised existing trimming/history-limit tests to reflect the new “user messages preserved” behavior.
  • Added new tests that simulate long single-turn agentic loops and validate tool-call/result consistency while preserving user requests.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
pkg/session/session.go Adjusts trimming logic to skip user messages and avoid orphaned tool results.
pkg/session/session_test.go Updates trimming test expectations around protected user messages/tool-call consistency.
pkg/session/session_history_test.go Reworks history-limit expectations and adds coverage for long agentic-loop scenarios.
Comments suppressed due to low confidence (2)

pkg/session/session_test.go:71

  • The updated test comment says the result “includes [both user messages] plus the most recent assistant/tool pair”, but with maxItems := 3 and the current trimming logic, it’s possible for the assistant message of that most recent pair to be trimmed (since user messages still contribute to toRemove). This test also no longer asserts the trimmed message set/length, so it may pass even if the most recent tool interaction is lost. Recommend adding assertions that the expected assistant/tool pair is actually present (and that older tool interactions are removed as intended) so the test detects regressions.
	// Both user messages are protected, so result includes them plus the most recent assistant/tool pair
	toolCalls := make(map[string]bool)
	for _, msg := range result {
		if msg.Role == chat.MessageRoleAssistant {
			for _, tool := range msg.ToolCalls {
				toolCalls[tool.ID] = true
			}
		}
		if msg.Role == chat.MessageRoleTool {
			assert.True(t, toolCalls[msg.ToolCallID], "tool result should have corresponding assistant message")
		}

pkg/session/session.go:762

  • trimMessages currently removes the oldest non-protected messages until len(removed) == toRemove, but if toRemove is derived from total conversation length (including protected user messages), the function can end up removing recent assistant messages even when the remaining overage is only due to protected users. This can produce a history containing only user messages (no assistant/tool context) and undermines num_history_items as an effective cap. A more robust approach is to cap only assistant/tool messages (or otherwise ensure the kept set always includes the most recent tool interaction(s) while preserving user messages).
	// Calculate how many conversation messages we need to remove
	toRemove := len(conversationMessages) - maxItems

	// Mark the oldest non-protected messages for removal
	removed := make(map[int]bool)
	for i := 0; i < len(conversationMessages) && len(removed) < toRemove; i++ {
		if protected[i] {
			continue
		}
		removed[i] = true
		if conversationMessages[i].Role == chat.MessageRoleAssistant {
			for _, toolCall := range conversationMessages[i].ToolCalls {
				toolCallsToRemove[toolCall.ID] = true
			}
		}
	}

	// Combine system messages with trimmed conversation messages
	result := make([]chat.Message, 0, len(systemMessages)+maxItems)

	// Add all system messages first
	result = append(result, systemMessages...)

	// Add protected and non-removed conversation messages
	for i, msg := range conversationMessages {
		if removed[i] {
			continue
		}

		// Skip orphaned tool results whose assistant message was removed
		if msg.Role == chat.MessageRoleTool && toolCallsToRemove[msg.ToolCallID] {
			continue
		}

		result = append(result, msg)
	}


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +35
name: "limit to 3 conversation messages — user messages protected",
numHistoryItems: 3,
messageCount: 10,
// 10 user (all protected) + 10 assistant. Need to remove 17, but only 10 removable.
// Result: 10 users + 0 assistants = 10
expectedConversationMsgs: 10,
},
{
name: "limit to 5 conversation messages",
numHistoryItems: 5,
messageCount: 8,
expectedConversationMsgs: 5, // Limited to 5 out of 16 total messages
name: "limit to 5 conversation messages — user messages protected",
numHistoryItems: 5,
messageCount: 8,
// 8 user (all protected) + 8 assistant. Need to remove 11, but only 8 removable.
// Result: 8 users + 0 assistants = 8
expectedConversationMsgs: 8,
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These expectations effectively encode that num_history_items is no longer a hard limit on conversation history (e.g., numHistoryItems: 3 yields expectedConversationMsgs: 10), which can significantly increase the prompt size and cost for multi-turn chats. If the intended behavior is “preserve user messages but still keep at most N assistant/tool messages”, the tests should reflect that (e.g., expected conversation count would be userCount + numHistoryItems, subject to tool-call consistency). As written, the tests also allow trimming away all assistant context while keeping many user messages, which is likely to confuse the model.

Copilot uses AI. Check for mistakes.
Comment on lines +130 to 146
// 8 conversation messages: 4 user + 4 assistant
// User messages are always protected, so only assistant messages can be trimmed.
testCases := []struct {
limit int
expectedTotal int
expectedConversation int
expectedSystem int
expectedUser int
expectedConversation int // total non-system
}{
{limit: 2, expectedTotal: 3, expectedConversation: 2, expectedSystem: 1},
{limit: 4, expectedTotal: 5, expectedConversation: 4, expectedSystem: 1},
{limit: 8, expectedTotal: 9, expectedConversation: 8, expectedSystem: 1},
{limit: 100, expectedTotal: 9, expectedConversation: 8, expectedSystem: 1},
// limit=2: need to remove 6 of 8, but 4 are protected users → only 4 assistants removable → remove 4
{limit: 2, expectedSystem: 1, expectedUser: 4, expectedConversation: 4},
// limit=4: need to remove 4 of 8, 4 are protected → remove all 4 assistants
{limit: 4, expectedSystem: 1, expectedUser: 4, expectedConversation: 4},
// limit=8: no trimming needed (8 <= 8)
{limit: 8, expectedSystem: 1, expectedUser: 4, expectedConversation: 8},
// limit=100: no trimming needed
{limit: 100, expectedSystem: 1, expectedUser: 4, expectedConversation: 8},
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestTrimMessagesConversationLimit currently expects that for small limits (e.g. 2 or 4) all assistant messages are removed and only the protected user messages remain, which means the returned history can contain multiple user turns with no assistant replies. If the goal is to prevent losing the user request while still sending a coherent recent conversation window, consider adjusting the trimming strategy/tests to retain some recent assistant/tool context (and treat the limit as applying to non-user messages), rather than allowing “user-only” histories.

Copilot uses AI. Check for mistakes.
Comment on lines 694 to 737
@@ -710,15 +712,27 @@ func trimMessages(messages []chat.Message, maxItems int) []chat.Message {
return messages
}

// Identify user message indices — these are protected from trimming
protected := make(map[int]bool)
for i, msg := range conversationMessages {
if msg.Role == chat.MessageRoleUser {
protected[i] = true
}
}

// Keep track of tool call IDs that need to be removed
toolCallsToRemove := make(map[string]bool)

// Calculate how many conversation messages we need to remove
toRemove := len(conversationMessages) - maxItems

// Start from the beginning (oldest messages)
for i := range toRemove {
// If this is an assistant message with tool calls, mark them for removal
// Mark the oldest non-protected messages for removal
removed := make(map[int]bool)
for i := 0; i < len(conversationMessages) && len(removed) < toRemove; i++ {
if protected[i] {
continue
}
removed[i] = true
if conversationMessages[i].Role == chat.MessageRoleAssistant {
for _, toolCall := range conversationMessages[i].ToolCalls {
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The doc comment says user messages are “not counted against the limit”, but the implementation still computes toRemove := len(conversationMessages) - maxItems, which implicitly counts user messages toward the limit. This leads to trimming assistant/tool messages more aggressively than intended when there are protected user messages (and can even drop the most recent assistant/tool interaction just to satisfy the count). Consider defining the limit in terms of removable messages (assistant/tool) and computing toRemove from that count, or update the comment to match the actual semantics.

Copilot uses AI. Check for mistakes.
@aheritier
Copy link
Contributor Author

To challenge with #1831

@aheritier aheritier marked this pull request as draft February 26, 2026 22:36
@aheritier aheritier self-assigned this Feb 27, 2026
@aheritier aheritier added kind/bug Something isn't working area/agent For work that has to do with the general agent loop/agentic features of the app labels Feb 27, 2026
@aheritier aheritier marked this pull request as ready for review February 27, 2026 10:43
Copy link

@docker-agent docker-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR successfully addresses issue #1863 by protecting user messages from being trimmed in agentic loops. The implementation is sound:

Core Logic: User messages are correctly identified and marked as protected before the trimming loop begins
Trimming Algorithm: The removal loop properly skips protected messages and continues until the required number of non-protected messages are removed
Tool Call Consistency: Tool results from removed assistant messages are correctly orphaned and skipped
Comprehensive Tests: Three new test cases validate the user message protection behavior in various scenarios

The changed code correctly implements the intended behavior and includes thorough test coverage. No bugs found.

@aheritier
Copy link
Contributor Author

I am not convinced by all copilot comments. The last one could make sense, others ... not sure

@dgageot dgageot merged commit 923ba05 into docker:main Feb 27, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/agent For work that has to do with the general agent loop/agentic features of the app kind/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

trimMessages drops user messages in long agentic loops, causing session derailment

4 participants