Skip to content

LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)#1435

Open
dprince wants to merge 4 commits intolightspeed-core:mainfrom
dprince:merge_server_tools
Open

LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)#1435
dprince wants to merge 4 commits intolightspeed-core:mainfrom
dprince:merge_server_tools

Conversation

@dprince
Copy link
Copy Markdown

@dprince dprince commented Mar 30, 2026

Description

Add server-tool merging and filtering for Responses API

Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.

Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.

Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.

Includes doc/diagram of goose (local agent) use case we are prototyping.

Type of change

  • Refactor
  • [x ] New feature
  • Bug fix
  • CVE fix
  • Optimization
  • [ x] Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

  • Assisted-by: Claude

Checklist before requesting a review

  • [x ] I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Added unit tests
  • Testing locally with working lightspeed-stack installation using /v1/responses API with Goose. Was able to verify working MCP tools at both the local agent (goose) level, and via the lightspeed-stack (server) level. If no extra header was configured in goose only the client tools were active (previous behavior)

Summary by CodeRabbit

  • New Features

    • Optional merging of server-configured and client-provided tools via request header; conflicts return HTTP 409.
    • Streaming responses filter and hide server-deployed tool outputs from clients, preserving consistent output field names.
    • Stored turn summaries/metrics now include only server-deployed tool outputs; client tool outputs are returned to callers.
  • Documentation

    • New guide describing the hybrid tool execution model, merging header, conflict semantics, and streaming behavior.
  • Tests

    • Expanded coverage for merging, conflict handling, filtering, and server/client output classification.

dprince added 2 commits March 30, 2026 14:33
Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.

Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.

Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.
Document the Goose agent integration pattern where server-configured tools
(RAG, MCP knowledge services) are merged with client-provided tools
(oc/kubectl). Covers the X-LCS-Merge-Server-Tools header behavior,
conflict detection, server-tool filtering in streamed responses, and
an end-to-end conversation flow example.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Walkthrough

Adds a header-driven hybrid tool merging model for /v1/responses, conflict detection returning HTTP 409, SSE filtering to hide server-deployed MCP events from clients, helper extraction for response finalization, and updates to tool resolution and turn-summary logic to distinguish server-deployed vs client-provided outputs.

Changes

Cohort / File(s) Summary
Documentation
docs/goose-example-tool-merging.md
New doc describing header-driven merging (X-LCS-Merge-Server-Tools: true), conflict rules for MCP/file_search, SSE filtering of server-deployed MCP events, and end-to-end flows.
Tool resolution & summaries
src/utils/responses.py
Added is_server_deployed_output() and internal _merge_tools(); extended resolve_tool_choice() to optionally merge server-configured tools (honors merge header) and return HTTP 409 on conflicts; updated build_turn_summary() to exclude client-provided outputs from stored summaries/metrics.
Response streaming & finalization
src/app/endpoints/responses.py
SSE serialization now uses model_dump(..., by_alias=True); added server-MCP detection and chunk-filtering logic that suppresses server-deployed MCP substream events and strips server MCP items from streamed response.output; extracted end-of-stream logic into _finalize_response().
Integration tests
tests/integration/endpoints/test_query_integration.py
Adjusted mocked MCP server_label and updated assertions to reflect that only server-deployed tool calls appear in stored response.tool_calls.
Unit tests
tests/unit/app/endpoints/test_responses.py, tests/unit/utils/test_responses.py
Added tests for server-MCP output detection and chunk filtering, _merge_tools() semantics including 409 conflict cases, is_server_deployed_output(), and resolve_tool_choice() merge behaviors and error propagation.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant ServerMCP as Server MCP
    participant ClientMCP as Client MCP

    Client->>Server: POST /v1/responses\ntools=[client_tools]\nX-LCS-Merge-Server-Tools: true
    Server->>ServerMCP: prepare_tools() (RAG + configured MCP)
    ServerMCP-->>Server: server_tools
    Note over Server: _merge_tools() checks conflicts:\n• MCP server_label collisions\n• file_search overlap
    alt Conflict Detected
        Server-->>Client: HTTP 409 Conflict
    else No Conflict
        Server->>Server: merged_tools = client_tools + server_tools
        Server-->>Client: Begin SSE streaming (merged tools available)
    end
Loading
sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant MCP as Server-Deployed MCP
    participant Storage as Turn Summary & Storage

    Client->>Server: Request using merged tools
    Server->>MCP: Execute server-deployed tool calls
    MCP-->>Server: mcp_call / results
    Server->>Server: response_generator() streaming SSE
    Note over Server: Filter & hide server-deployed MCP item types\n(mcp_call, mcp_list_tools, mcp_approval_request)
    Server->>Client: SSE stream with server MCP events suppressed
    Server->>Server: _finalize_response()
    Note over Server: build_turn_summary() includes only server-deployed outputs
    Server->>Storage: Store summary/metrics with server tool outputs
    Server-->>Client: Complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: adding MCP tool merging support to the /v1/responses API with reference to the Goose agent use case.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dprince dprince changed the title /v1/responses support for MCP tool merging (goose agent support) LCORE-1616 /v1/responses support for MCP tool merging (goose agent support) Mar 30, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 95-142: The fenced code block containing the conversation flow
diagram (the block starting with "Goose                     LCS /v1/responses   
Llama Stack        MCP/RAG" and the opening triple backticks) is missing a
language specifier; update the opening fence from just ``` to ```text (or
```plaintext) so the diagram is treated as plain text in rendering and
syntax-highlighting. Locate the triple-backtick that begins the diagram and add
the specifier, leaving the rest of the block unchanged.
- Around line 14-50: The fenced ASCII diagram block in
docs/goose-example-tool-merging.md is missing a language specifier which linters
flag; update the opening fence for the diagram (the triple-backtick that
currently starts the block) to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for
static analysis and rendering; ensure only the opening fence is changed and the
rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art)
remains untouched.

In `@src/utils/responses.py`:
- Around line 1126-1158: The is_server_deployed_output function currently treats
an MCPApprovalResponse (item_type "mcp_approval_response") as server-side
because it lacks a server_label; update is_server_deployed_output to explicitly
handle "mcp_approval_response" (e.g., return False to mark it as
client-provided) or implement lookup/propagation of server_label, and add a unit
test in TestIsServerDeployedOutput to cover the approval response case;
reference is_server_deployed_output and TestIsServerDeployedOutput when making
the change.

In `@tests/unit/utils/test_responses.py`:
- Around line 2773-2804: The test test_client_tools_with_merge_header only
asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools";
update the test (and the other affected tests around the same blocks) to also
call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools"
and assert the same merged-tools behavior (same tools, tool_choice, etc.) so
header normalization won't cause regressions; locate the calls to
resolve_tool_choice and duplicate or parametrize them to include both header key
variants to validate lowercase handling.
- Around line 2626-2635: Add a complementary positive test that verifies
mcp_approval_request is classified server-side when the server label matches:
create a new test (e.g., test_mcp_approval_request_server_side) that mocks
configuration (mocker.patch("utils.responses.configuration")) with
mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create
an item with item.type = "mcp_approval_request" and item.server_label =
"client-mcp", and assert is_server_deployed_output(item) is True; reference the
existing test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5204ab3-7d8c-490d-8111-8a64f71f83a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b68d36 and 20809d0.

📒 Files selected for processing (6)
  • docs/goose-example-tool-merging.md
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • src/app/endpoints/responses.py
  • src/utils/responses.py
  • tests/unit/utils/test_responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/integration/endpoints/test_query_integration.py
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

  • src/app/endpoints/responses.py
  • src/utils/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

  • src/utils/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

  • src/utils/responses.py
🪛 markdownlint-cli2 (0.22.0)
docs/goose-example-tool-merging.md

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (13)
src/utils/responses.py (3)

1193-1195: LGTM! Server-deployed output filtering in turn summary.

The filter correctly excludes client-provided tool outputs from the turn summary, ensuring only server-deployed tool calls and results are stored in metrics and conversation history.


1363-1411: LGTM! Tool merge logic with conflict detection.

The implementation correctly:

  • Detects MCP conflicts by comparing server_label values
  • Detects file_search conflicts when both client and server provide one
  • Returns merged tools with client tools first
  • Uses appropriate HTTP 409 status code for conflicts

Minor observation: Line 1411 wraps inputs in list() unnecessarily since client_tools and server_tools are already typed as list[InputTool], but this has no functional impact.


1449-1476: LGTM! Header-driven merge behavior.

The merge logic correctly:

  • Checks for X-LCS-Merge-Server-Tools: true header (case-insensitive)
  • Only merges when the header is explicitly set
  • Preserves original behavior when header is absent
  • Reuses the same vector_store_ids resolved from client tools when preparing server tools
tests/integration/endpoints/test_query_integration.py (2)

731-736: LGTM! Test expectations updated for server-deployed tool filtering.

The assertions correctly reflect the new behavior where:

  • Only server-deployed tool calls are included in response.tool_calls
  • function_call items (client-provided) are excluded
  • The expected count is reduced from 2 to 1

The comment clearly explains the reasoning.


635-635: No issues found: server1 is a valid MCP server name configured in test fixtures.

Verification confirms the server_label change to "server1" aligns with the MCP server configuration in the test fixtures (tests/integration/test_configuration.py lines 85-93 define server1 as a valid MCP server). The test setup correctly pairs this label with the mcp_list_tools item, and the assertions properly validate server-deployed tool classification.

tests/unit/app/endpoints/test_responses.py (1)

1378-1427: LGTM! Comprehensive tests for _is_server_mcp_output_item.

The test class provides good coverage including:

  • MCP types with matching and non-matching server labels
  • All three MCP event types (mcp_call, mcp_list_tools, mcp_approval_request)
  • Missing server_label handling
  • Non-MCP types returning False
  • Empty configured labels edge case
src/app/endpoints/responses.py (5)

405-407: LGTM! Added by_alias=True for SSE serialization.

Using by_alias=True ensures field names are serialized according to their aliases (e.g., created_at vs createdAt), maintaining consistency with the OpenAI Responses specification for SSE events.

Also applies to: 423-425, 435-435, 451-453


465-480: LGTM! Helper for filtering serialized MCP output items.

The function correctly identifies server-deployed MCP items from the serialized response.output array by checking:

  1. Item type is one of the MCP types
  2. server_label matches a configured MCP server

This complements is_server_deployed_output from utils/responses.py which operates on typed objects rather than dicts.


483-527: LGTM! Streaming chunk filter with output index tracking.

The implementation correctly:

  • Tracks output_index values when server MCP items are added
  • Filters intermediate events (response.mcp_call.*, response.mcp_list_tools.*) by tracked indices
  • Cleans up tracking when output_item.done is received using discard() (safe for missing keys)
  • Mutates server_mcp_output_indices as a side effect, which is documented via the function's purpose

530-575: LGTM! Extracted finalization logic into helper.

The _finalize_response helper:

  • Consolidates post-stream metadata extraction
  • Uses is_server_deployed_output to filter tool summaries (consistent with non-streaming path)
  • Handles conversation persistence with append_turn_items_to_conversation

This improves readability of response_generator by separating streaming from finalization concerns.


636-644: LGTM! Filtering server MCP items from response output.

The inline list comprehension correctly removes server-deployed MCP items from the output array in the serialized response, ensuring clients only receive item types they understand (message, function_call, etc.).

docs/goose-example-tool-merging.md (1)

1-92: LGTM! Clear documentation for hybrid tool architecture.

The documentation provides excellent coverage of:

  • The separation of knowledge (server) vs action (client) tools
  • Header-driven merge behavior with clear behavior table
  • Conflict detection semantics
  • SSE streaming filter mechanics with step-by-step explanation
  • Turn summary behavior for server vs client outputs

The architecture and flow diagrams effectively illustrate the concepts.

tests/unit/utils/test_responses.py (1)

2576-2738: Good coverage on classification and conflict behavior.

The new TestIsServerDeployedOutput and TestMergeTools cases are well-targeted and align with the merge/conflict contract (including 409 behavior).

Comment on lines +14 to +50
```
OpenShift Cluster
┌────────────────────────────────────────────────────────────────────────────┐
│ │
│ namespace: foo-staging │
│ ┌─────────────────────────────────────┐ │
│ │ Goose Agent (Pod) │ │
│ │ │ │
│ │ ServiceAccount: goose-readonly │ │
│ │ (read-only, limited k8s/OCP access)│ │
│ │ │ HTTPS POST /v1/responses │
│ │ LLM reasoning loop │ │
│ │ ├─ LCS /v1/responses ──────────────┼─────────────────────┐ │
│ │ │ (knowledge queries) │ │ │
│ │ │ │ ▼ │
│ │ ├─ Local MCP tools │ namespace: lightspeed-stack │
│ │ │ (cluster read-only operations) │ ┌─────────────────────────┐ │
│ │ │ ├─ oc get pods │ │ Lightspeed Stack (LCS) │ │
│ │ │ ├─ oc logs │ │ │ │
│ │ │ └─ oc describe │ │ Llama Stack Engine │ │
│ │ └──────────────────────────────────┘ │ ├─ file_search (RAG) │ │
│ └──────────┬─────────────────────────── │ ├─ mcp: OKP/Solr │ │
│ │ ServiceAccount token │ ├─ mcp: errata │ │
│ │ (read-only RBAC) │ └─ mcp: bugzilla │ │
│ │ └───────────┬─────────────┘ │
│ │ ┌─────┴──────┐ │
│ │ │FAISS / Solr│ │
│ │ └────────────┘ │
│ │ │
│ │ namespace: foo-staging │
│ │ ┌───────────────────────────┐ │
│ └─▶│ pods, deployments, ... │ │
│ │ (read-only RBAC enforced)│ │
│ └───────────────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────┘
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Static analysis flagged this code block as missing a language specifier. For ASCII diagrams, use text or plaintext to satisfy linters and improve rendering hints.

-```
+```text
 OpenShift Cluster
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` around lines 14 - 50, The fenced ASCII
diagram block in docs/goose-example-tool-merging.md is missing a language
specifier which linters flag; update the opening fence for the diagram (the
triple-backtick that currently starts the block) to include a language token
such as text or plaintext (e.g., change ``` to ```text) so the ASCII diagram is
annotated for static analysis and rendering; ensure only the opening fence is
changed and the rest of the Goose Agent / OpenShift Cluster diagram content (the
ASCII art) remains untouched.

Comment on lines +95 to +142
```
Goose LCS /v1/responses Llama Stack MCP/RAG
│ │ │ │
│ "pod api-7f8b9 is crash- │ │ │
│ looping in foo-staging, │ │ │
│ help me diagnose it" │ │ │
│ │ │ │
│ POST {input, tools, │ │ │
│ X-LCS-Merge-Server-Tools} │ │ │
│────────────────────────────▶│ │ │
│ │ merge client + │ │
│ │ server tools │ │
│ │ │ │
│ │ responses.create() │ │
│ │──────────────────────▶│ │
│ │ │ LLM: search RAG│
│ │ │───────────────▶│
│ │ │◀── RAG chunks: │
│ │ │ known OOM fix │
│ │ │ for api image │
│ │ │ │
│ │ │ LLM: query OKP │
│ │ │───────────────▶│
│ │ │◀── KB article: │
│ │ │ memory limits │
│ │ │ best practice │
│ │ │ │
│ │◀──────────────────────│ answer with │
│ │ │ RAG + KB ctx │
│ SSE: response.completed │ │ │
│ (server MCP items filtered)│ │ │
│◀────────────────────────────│ │ │
│ │ │ │
│ Goose combines LCS knowledge with live cluster data: │
│ │ │ │
│──▶ LOCAL MCP: oc get pods -n foo-staging │
│◀── pod/api-7f8b9 CrashLoopBackOff │
│──▶ LOCAL MCP: oc logs pod/api-7f8b9 │
│◀── OOMKilled │
│──▶ LOCAL MCP: oc describe pod/api-7f8b9 │
│◀── events, resource limits │
│ │
│ Goose correlates live cluster state with RAG/OKP │
│ knowledge to produce a diagnosis: │
│ "Pod is OOMKilled. RAG docs confirm a known fix │
│ for this image -- raise memory limit to 512Mi. │
│ See KB article for memory tuning best practices." │
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Same as above, add text or plaintext language specifier for the conversation flow diagram.

-```
+```text
 Goose                     LCS /v1/responses        Llama Stack        MCP/RAG
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/goose-example-tool-merging.md` around lines 95 - 142, The fenced code
block containing the conversation flow diagram (the block starting with "Goose  
LCS /v1/responses        Llama Stack        MCP/RAG" and the opening triple
backticks) is missing a language specifier; update the opening fence from just
``` to ```text (or ```plaintext) so the diagram is treated as plain text in
rendering and syntax-highlighting. Locate the triple-backtick that begins the
diagram and add the specifier, leaving the rest of the block unchanged.

Comment on lines +1126 to +1158
def is_server_deployed_output(output_item: ResponseOutput) -> bool:
"""Check if a response output item belongs to a tool deployed by LCS.

In the hybrid architecture clients may provide their own tools (function
tools or MCP servers running locally) alongside server-configured tools.
This function identifies items that belong to LCS-deployed tools so that
only those are included in server-side processing (turn summary, metrics,
storage). Client tool output items are still returned in the response
to the caller but are not processed internally.

Args:
output_item: A ResponseOutput item from the response.

Returns:
True if the item should be processed by LCS, False for client tools.
"""
item_type = getattr(output_item, "type", None)

# function_call items are always from client-provided tools;
# LCS only configures file_search and mcp tools.
if item_type == "function_call":
return False

# MCP items: check server_label against configured servers
if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
server_label = getattr(output_item, "server_label", None)
if server_label is not None:
configured_labels = {s.name for s in configuration.mcp_servers}
return server_label in configured_labels

# file_search_call, web_search_call, message, and unknown types
# are treated as server-side.
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if mcp_approval_response has server_label attribute in the llama_stack_api types
rg -n "mcp_approval_response" --type=py -C5

Repository: lightspeed-core/lightspeed-stack

Length of output: 7880


🏁 Script executed:

#!/bin/bash
# Find MCPApprovalResponse definition and where approval responses are created
rg -n "class MCPApprovalResponse|MCPApprovalResponse.*=" --type=py -A3 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Search for where mcp_approval_response items originate/are constructed
rg -n "MCPApprovalResponse\(" --type=py -B2 -A2

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Check if MCPApprovalResponse has server_label attribute
rg -n "approval_request_id|server_label" src/utils/responses.py | grep -A2 -B2 "820\|821\|822"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Find MCPApprovalResponse import and usage
rg -n "MCPApprovalResponse" --type=py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 732


🏁 Script executed:

#!/bin/bash
# Look at the build_tool_call_summary function more closely to understand MCPApprovalResponse structure
sed -n '815,835p' src/utils/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 774


🏁 Script executed:

#!/bin/bash
# Check all response output types handled in is_server_deployed_output context
rg -n "type.*=.*\"mcp" --type=py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 2390


🏁 Script executed:

#!/bin/bash
# Check imports at top of responses.py to understand MCPApprovalResponse
head -50 src/utils/responses.py | grep -A5 -B5 "MCPApprovalResponse"

Repository: lightspeed-core/lightspeed-stack

Length of output: 439


🏁 Script executed:

#!/bin/bash
# Search for any mentions of server_label in approval context
rg -n "server_label" src/utils/responses.py | grep -B3 -A3 "approval"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Check test for is_server_deployed_output to see if mcp_approval_response is tested
rg -n "is_server_deployed_output" --type=py -B2 -A10

Repository: lightspeed-core/lightspeed-stack

Length of output: 12041


🏁 Script executed:

#!/bin/bash
# Look for the full test class to see if there's a test for mcp_approval_response
sed -n '2576,2661p' tests/unit/utils/test_responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3657


🏁 Script executed:

#!/bin/bash
# Check the MCPApprovalResponse type to see what attributes it has
rg -n "class.*MCPApprovalResponse|@dataclass.*approval" --type=py -A15 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58


🏁 Script executed:

#!/bin/bash
# Check if approval_request_id can be used to track origin of approval responses
rg -n "approval_request_id" src/utils/responses.py -B3 -A3

Repository: lightspeed-core/lightspeed-stack

Length of output: 410


🏁 Script executed:

#!/bin/bash
# Check if there's a way to track which approval_request originated from which server
rg -n "approval_request_id" src/utils/ -A2 -B2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 856


🏁 Script executed:

#!/bin/bash
# Look for where mcp_approval_response items are created or generated to see if there's origin tracking
rg -n "mcp_approval_response" src/ --type=py | grep -v test | grep -v ".pyc"

Repository: lightspeed-core/lightspeed-stack

Length of output: 373


🏁 Script executed:

Aboriginal
# Check if there's any validation or origin tracking for approval requests/responses
rg -n "MCPApprovalRequest|MCPApprovalResponse" src/ --type=py -A5 | grep -E "server_label|origin|source" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 124


Add explicit handling for mcp_approval_response type or document design assumption.

The function checks mcp_approval_request but not mcp_approval_response. Since MCPApprovalResponse (imported from llama_stack_api.openai_responses) lacks a server_label attribute and only carries approval_request_id, this type falls through to the default return True, classifying all approval responses as server-deployed regardless of origin.

If client-provided MCP servers can emit approval responses, they would be incorrectly included in server-side processing (turn summary, storage) due to the missing origin check. Either:

  • Add server_label tracking to approval responses so they can be classified by origin, or
  • Document that approval responses are always treated as server-side by design (with rationale), or
  • Add test coverage for this scenario to TestIsServerDeployedOutput
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/responses.py` around lines 1126 - 1158, The
is_server_deployed_output function currently treats an MCPApprovalResponse
(item_type "mcp_approval_response") as server-side because it lacks a
server_label; update is_server_deployed_output to explicitly handle
"mcp_approval_response" (e.g., return False to mark it as client-provided) or
implement lookup/propagation of server_label, and add a unit test in
TestIsServerDeployedOutput to cover the approval response case; reference
is_server_deployed_output and TestIsServerDeployedOutput when making the change.

Comment on lines +2626 to +2635
def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
"""Test mcp_approval_request with unmatched label is client-side."""
mock_config = mocker.Mock()
mock_config.mcp_servers = []
mocker.patch("utils.responses.configuration", mock_config)

item = mocker.Mock()
item.type = "mcp_approval_request"
item.server_label = "client-mcp"
assert is_server_deployed_output(item) is False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. Please add the matching-label path too, since this type is part of the server-event filtering contract.

Proposed test addition
 class TestIsServerDeployedOutput:
@@
     def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
         """Test mcp_approval_request with unmatched label is client-side."""
@@
         assert is_server_deployed_output(item) is False
+
+    def test_mcp_approval_request_server_side_when_label_matches(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test mcp_approval_request with matching label is server-deployed."""
+        mock_config = mocker.Mock()
+        mock_server = mocker.Mock()
+        mock_server.name = "server-mcp"
+        mock_config.mcp_servers = [mock_server]
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "server-mcp"
+        assert is_server_deployed_output(item) is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2626 - 2635, Add a
complementary positive test that verifies mcp_approval_request is classified
server-side when the server label matches: create a new test (e.g.,
test_mcp_approval_request_server_side) that mocks configuration
(mocker.patch("utils.responses.configuration")) with mock_config.mcp_servers
containing the target label (e.g., "client-mcp"), create an item with item.type
= "mcp_approval_request" and item.server_label = "client-mcp", and assert
is_server_deployed_output(item) is True; reference the existing
test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.

Comment on lines +2773 to +2804
async def test_client_tools_with_merge_header(
self, mocker: MockerFixture
) -> None:
"""Test client tools merged with server tools when header is set."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
mock_config = mocker.Mock()
mock_config.configuration.byok_rag = []
mocker.patch("utils.responses.configuration", mock_config)

server_mcp = InputToolMCP(
server_label="server-tool", server_url="http://server"
)
mocker.patch(
"utils.responses.prepare_tools",
new=mocker.AsyncMock(return_value=[server_mcp]),
)

client_tool = InputToolMCP(
server_label="client-tool", server_url="http://client"
)
tools, tool_choice, _ = await resolve_tool_choice(
tools=[client_tool],
tool_choice=None,
token="tok",
request_headers={"X-LCS-Merge-Server-Tools": "true"},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Header-merge tests should also validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (Line 2803/2841/2899). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.

Proposed hardening for header-key coverage
 class TestResolveToolChoice:
@@
-    async def test_client_tools_with_merge_header(
+    `@pytest.mark.parametrize`(
+        "merge_header_key",
+        ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+    )
+    async def test_client_tools_with_merge_header(
         self, mocker: MockerFixture
+        , merge_header_key: str
     ) -> None:
@@
         tools, tool_choice, _ = await resolve_tool_choice(
             tools=[client_tool],
             tool_choice=None,
             token="tok",
-            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+            request_headers={merge_header_key: "true"},
         )

Also applies to: 2836-2842, 2895-2900

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2773 - 2804, The test
test_client_tools_with_merge_header only asserts behavior when the header key
uses uppercase "X-LCS-Merge-Server-Tools"; update the test (and the other
affected tests around the same blocks) to also call resolve_tool_choice with a
lowercase header key "x-lcs-merge-server-tools" and assert the same merged-tools
behavior (same tools, tool_choice, etc.) so header normalization won't cause
regressions; locate the calls to resolve_tool_choice and duplicate or
parametrize them to include both header key variants to validate lowercase
handling.

@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Mar 30, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

ToolChoice none ignored

The no_tools flag (when tool_choice is none) is computed only in the branch where client tools are provided, but not passed to prepare_tools in the else branch. As a result, requests with no tools and tool_choice: "none" will still load server tools instead of disabling them.

    mcp_headers: Optional[McpHeaders] = None,
    request_headers: Optional[Mapping[str, str]] = None,
) -> tuple[Optional[list[InputTool]], Optional[ToolChoice], Optional[list[str]]]:
    """Resolve tools and tool_choice for the Responses API.

    When the request includes tools and the ``X-LCS-Merge-Server-Tools: true``
    header is set, client-provided tools are merged with server-configured
    tools (RAG, MCP).  Conflicts (e.g. a client MCP tool with the same
    server_label as a server-configured one, or duplicate file_search tools)
    are rejected with a 409 error.
    Without the header, client-provided tools are used as-is (original
    behavior).  When no tools are provided in the request, all
    LCORE-configured tools are loaded via prepare_tools.  The tool_choice
    "none" flag disables all tools.

    Args:
        tools: Tools from the request, or None to use LCORE-configured tools.
        tool_choice: Requested tool choice (e.g. auto, required, none) or None.
        token: User token for MCP/auth.
        mcp_headers: Optional MCP headers to propagate.
        request_headers: Optional request headers for tool resolution.

    Returns:
        A tuple of (prepared_tools, prepared_tool_choice, vector_store_ids):
        prepared_tools is the list of tools to use, or None if none configured;
        prepared_tool_choice is the resolved tool choice, or None when there
        are no tools; vector_store_ids is extracted from tools (in user-facing format)
        when provided, otherwise None.
    """
    prepared_tools: Optional[list[InputTool]] = None
    client = AsyncLlamaStackClientHolder().get_client()
    no_tools = (
        isinstance(tool_choice, ToolChoiceMode) and tool_choice == ToolChoiceMode.none
    )

    merge_server_tools = (
        request_headers is not None
        and request_headers.get("X-LCS-Merge-Server-Tools", "").lower() == "true"
    )

    if tools:  # explicitly specified in request
        # Per-request override of vector stores (user-facing rag_ids)
        vector_store_ids = extract_vector_store_ids_from_tools(tools)
        # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool
        byok_rags = configuration.configuration.byok_rag
        prepared_tools = translate_tools_vector_store_ids(tools, byok_rags)

        # Optionally merge server-configured tools (RAG, MCP) with client tools
        if merge_server_tools:
            server_tools = await prepare_tools(
                client=client,
                vector_store_ids=vector_store_ids,
                no_tools=no_tools,
                token=token,
                mcp_headers=mcp_headers,
                request_headers=request_headers,
            )
            if server_tools:
                prepared_tools = _merge_tools(prepared_tools, server_tools)
Incomplete MCP filtering

The _should_filter_mcp_chunk helper drops events for mcp_call and mcp_list_tools substreams but does not filter response.mcp_approval_request.* events. This may leak server-side approval request chunks to clients that cannot handle them.

def _should_filter_mcp_chunk(
    chunk: OpenAIResponseObjectStream,
    event_type: Optional[str],
    configured_mcp_labels: set[str],
    server_mcp_output_indices: set[int],
) -> bool:
    """Check if a streaming chunk is a server-deployed MCP event that should be filtered.

    Args:
        chunk: The streaming chunk to check.
        event_type: The event type of the chunk.
        configured_mcp_labels: Set of server_label names configured in LCS.
        server_mcp_output_indices: Tracked output indices of server-deployed MCP calls.

    Returns:
        True if the chunk should be filtered out from the client stream.
    """
    if event_type == "response.output_item.added":
        item_added_chunk = cast(OutputItemAddedChunk, chunk)
        item = item_added_chunk.item
        item_type = getattr(item, "type", None)
        if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
            server_label = getattr(item, "server_label", None)
            if server_label in configured_mcp_labels:
                server_mcp_output_indices.add(item_added_chunk.output_index)
                return True

    if event_type and (
        event_type.startswith("response.mcp_call.")
        or event_type.startswith("response.mcp_list_tools.")
    ):
        output_index = getattr(chunk, "output_index", None)
        if output_index in server_mcp_output_indices:
            return True

    if event_type == "response.output_item.done":
        item_done_chunk = cast(OutputItemDoneChunk, chunk)
        item = item_done_chunk.item
        item_type = getattr(item, "type", None)
        if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
            if item_done_chunk.output_index in server_mcp_output_indices:
                server_mcp_output_indices.discard(item_done_chunk.output_index)
                return True

    return False

Cover the new functions introduced in the server-tool merging feature:
- is_server_deployed_output: classify output items as server vs client
- _merge_tools: merge client and server tools with conflict detection
- resolve_tool_choice: X-LCS-Merge-Server-Tools header integration
- _is_server_mcp_output_item: filter serialized MCP items in streaming

29 new tests across utils/test_responses.py and endpoints/test_responses.py
validating correct behavior for MCP label matching, 409 conflict rejection,
file_search deduplication, and pass-through of non-MCP item types.
@dprince dprince force-pushed the merge_server_tools branch from 20809d0 to f02648d Compare March 30, 2026 21:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 1378-1427: Add a regression test ensuring approval-request
substream events are filtered from client streams: in TestIsServerMcpOutputItem
add a new test (e.g., test_mcp_approval_request_stream_filtering) that simulates
a streaming chunk key starting with "response.mcp_approval_request." and asserts
the streaming filter logic rejects/filters that chunk so approval-request events
cannot leak to clients; reference the mcp approval path by using the
"response.mcp_approval_request.*" key and the existing helper
_is_server_mcp_output_item (and whatever streaming chunk filter function is used
in the responses code) to validate the chunk is filtered when appropriate.

In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a new unit test that mirrors
test_no_tools_uses_prepare_tools but calls resolve_tool_choice with
tool_choice="none" (and tools=None) and asserts that the patched prepare_tools
(utils.responses.prepare_tools) is invoked with no_tools=True; locate the test
around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d826784-c522-4d69-a050-222bc472dc2e

📥 Commits

Reviewing files that changed from the base of the PR and between 20809d0 and f02648d.

📒 Files selected for processing (2)
  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: server mode / ci
  • GitHub Check: E2E: library mode / ci
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
  • tests/unit/utils/test_responses.py
🔇 Additional comments (3)
tests/unit/app/endpoints/test_responses.py (1)

15-20: Import addition is clean and consistent.

The new _is_server_mcp_output_item import (Line 16) fits the existing absolute internal import pattern and keeps the test module scope focused.

tests/unit/utils/test_responses.py (2)

2624-2633: Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. A matching-label test is needed since mcp_approval_request is part of the server-event filtering contract (lines 2585-2596 test both paths for mcp_call, lines 2611-2622 for mcp_list_tools).

Proposed test addition
 class TestIsServerDeployedOutput:
@@
     def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
         """Test mcp_approval_request with unmatched label is client-side."""
@@
         assert is_server_deployed_output(item) is False
+
+    def test_mcp_approval_request_server_side_when_label_matches(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test mcp_approval_request with matching label is server-deployed."""
+        mock_config = mocker.Mock()
+        mock_server = mocker.Mock()
+        mock_server.name = "server-mcp"
+        mock_config.mcp_servers = [mock_server]
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "server-mcp"
+        assert is_server_deployed_output(item) is True

2769-2802: Header-merge tests should validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (lines 2797, 2835, 2889). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.

Proposed hardening for header-key coverage
 class TestResolveToolChoice:
@@
-    async def test_client_tools_with_merge_header(
+    `@pytest.mark.parametrize`(
+        "merge_header_key",
+        ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+    )
+    async def test_client_tools_with_merge_header(
         self, mocker: MockerFixture
+        , merge_header_key: str
     ) -> None:
@@
         tools, tool_choice, _ = await resolve_tool_choice(
             tools=[client_tool],
             tool_choice=None,
             token="tok",
-            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+            request_headers={merge_header_key: "true"},
         )

Apply similar parametrization to test_merge_header_conflict_raises_409 and test_merge_header_no_server_tools_returns_client_only.

Comment on lines +2840 to +2860
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
mock_prepare = mocker.AsyncMock(return_value=[server_tool])
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice=None,
token="tok",
)
assert tools is not None
assert len(tools) == 1
mock_prepare.assert_called_once()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add test coverage for tool_choice="none" scenario.

The test only validates tool_choice=None (line 2855). Per the PR objectives, there's a known issue where tool_choice="none" may not correctly disable tools when tools=None. Add a test asserting that prepare_tools is called with no_tools=True when tool_choice="none".

Proposed test addition
     `@pytest.mark.asyncio`
     async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
         """Test that no client tools falls through to prepare_tools."""
@@
         assert len(tools) == 1
         mock_prepare.assert_called_once()
+
+    `@pytest.mark.asyncio`
+    async def test_tool_choice_none_passes_no_tools_flag(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test that tool_choice='none' sets no_tools=True in prepare_tools."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        mock_prepare = mocker.AsyncMock(return_value=None)
+        mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+        tools, _, _ = await resolve_tool_choice(
+            tools=None,
+            tool_choice="none",
+            token="tok",
+        )
+        assert tools is None
+        mock_prepare.assert_called_once()
+        # Verify no_tools=True was passed
+        call_kwargs = mock_prepare.call_args.kwargs
+        assert call_kwargs.get("no_tools") is True
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
mock_prepare = mocker.AsyncMock(return_value=[server_tool])
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice=None,
token="tok",
)
assert tools is not None
assert len(tools) == 1
mock_prepare.assert_called_once()
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
mock_prepare = mocker.AsyncMock(return_value=[server_tool])
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice=None,
token="tok",
)
assert tools is not None
assert len(tools) == 1
mock_prepare.assert_called_once()
`@pytest.mark.asyncio`
async def test_tool_choice_none_passes_no_tools_flag(
self, mocker: MockerFixture
) -> None:
"""Test that tool_choice='none' sets no_tools=True in prepare_tools."""
mock_client = mocker.AsyncMock()
mock_holder = mocker.Mock()
mock_holder.get_client.return_value = mock_client
mocker.patch(
"utils.responses.AsyncLlamaStackClientHolder",
return_value=mock_holder,
)
mock_prepare = mocker.AsyncMock(return_value=None)
mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
tools, _, _ = await resolve_tool_choice(
tools=None,
tool_choice="none",
token="tok",
)
assert tools is None
mock_prepare.assert_called_once()
# Verify no_tools=True was passed
call_kwargs = mock_prepare.call_args.kwargs
assert call_kwargs.get("no_tools") is True
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a new unit
test that mirrors test_no_tools_uses_prepare_tools but calls resolve_tool_choice
with tool_choice="none" (and tools=None) and asserts that the patched
prepare_tools (utils.responses.prepare_tools) is invoked with no_tools=True;
locate the test around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.

…lter_mcp_chunk

The helper already handled mcp_approval_request items in the output_item.added
and output_item.done branches, but the substream event prefix check only matched
response.mcp_call.* and response.mcp_list_tools.*, allowing server-side approval
request progress events to leak to clients that cannot handle them.

Add the missing response.mcp_approval_request.* prefix check and add unit tests
for _should_filter_mcp_chunk covering all three substream event types.
@dprince
Copy link
Copy Markdown
Author

dprince commented Mar 30, 2026

@tisnik thanks for the review. I added a commit to address the 'incomplete mcp filtering'. For the 'no_tools' comment I looked and think the no_tools variable is scoped to the entire function and is passed into both prepare_tools calls. let me know if I am missing something

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 556-576: The persisted turn still contains client-side tool items
because append_turn_items_to_conversation is called with
latest_response_object.output; before calling append_turn_items_to_conversation
(after building turn_summary and rag_chunks) filter
latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a365396-e85c-42bc-8ec0-e4067d6ace16

📥 Commits

Reviewing files that changed from the base of the PR and between f02648d and 2d6fea6.

📒 Files selected for processing (2)
  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
  • GitHub Check: E2E: library mode / ci
  • GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

  • src/app/endpoints/responses.py
  • tests/unit/app/endpoints/test_responses.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

  • src/app/endpoints/responses.py
src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

  • src/app/endpoints/responses.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

  • tests/unit/app/endpoints/test_responses.py
tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

  • tests/unit/app/endpoints/test_responses.py
🔇 Additional comments (2)
tests/unit/app/endpoints/test_responses.py (1)

1466-1498: Good regression coverage for approval-request substreams.

This closes the response.mcp_approval_request.* hole that previously let unknown MCP events leak to OpenAI-compatible clients.

src/app/endpoints/responses.py (1)

531-537: The review comment is incorrect. Both code paths in update_azure_token() properly update the AsyncLlamaStackClientHolder's internal client:

  1. Library client mode (update_azure_token line 17): Calls reload_library_client(), which sets self._lsc = client (client.py:119)
  2. Service client mode (update_azure_token line 33): Calls update_provider_data(), which does self._lsc = self._lsc.copy(set_default_headers=updated_headers) (client.py:54)

Since AsyncLlamaStackClientHolder is a Singleton, when _finalize_response() calls get_client() at line 572, it retrieves the same holder instance that was just updated by update_azure_token(). The client will have fresh credentials—there is no staleness issue.

Comment on lines +556 to +576
for item in latest_response_object.output:
if not is_server_deployed_output(item):
continue
tool_call, tool_result = build_tool_call_summary(item)
if tool_call:
turn_summary.tool_calls.append(tool_call)
if tool_result:
turn_summary.tool_results.append(tool_result)

tool_rag_chunks = parse_rag_chunks(
latest_response_object,
vector_store_ids,
configuration.rag_id_mapping,
)
turn_summary.rag_chunks = inline_rag_context.rag_chunks + tool_rag_chunks

client = AsyncLlamaStackClientHolder().get_client()
if api_params.store and api_params.previous_response_id and latest_response_object:
await append_turn_items_to_conversation(
client, api_params.conversation, user_input, latest_response_object.output
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Filter client tool items before persisting the streamed turn.

Line 557 already gates internal processing with is_server_deployed_output(), but append_turn_items_to_conversation() on Lines 573-576 still receives the full latest_response_object.output. src/utils/conversations.py:454-497 stores every llm_output item it gets, so client-local MCP/function items are still persisted on the streaming path.

Proposed fix
     client = AsyncLlamaStackClientHolder().get_client()
     if api_params.store and api_params.previous_response_id and latest_response_object:
+        stored_output = [
+            item
+            for item in latest_response_object.output
+            if is_server_deployed_output(item)
+        ]
         await append_turn_items_to_conversation(
-            client, api_params.conversation, user_input, latest_response_object.output
+            client, api_params.conversation, user_input, stored_output
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app/endpoints/responses.py` around lines 556 - 576, The persisted turn
still contains client-side tool items because append_turn_items_to_conversation
is called with latest_response_object.output; before calling
append_turn_items_to_conversation (after building turn_summary and rag_chunks)
filter latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants