LCORE-1616 /v1/responses support for MCP tool merging (goose agent support) by dprince · Pull Request #1435 · lightspeed-core/lightspeed-stack

dprince · 2026-03-30T18:46:50Z

Description

Add server-tool merging and filtering for Responses API

Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.

Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.

Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.

Includes doc/diagram of goose (local agent) use case we are prototyping.

Type of change

Tools used to create PR

Assisted-by: Claude

Checklist before requesting a review

[x ] I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Added unit tests
Testing locally with working lightspeed-stack installation using /v1/responses API with Goose. Was able to verify working MCP tools at both the local agent (goose) level, and via the lightspeed-stack (server) level. If no extra header was configured in goose only the client tools were active (previous behavior)

Summary by CodeRabbit

New Features
- Optional merging of server-configured and client-provided tools via request header; conflicts return HTTP 409.
- Streaming responses filter and hide server-deployed tool outputs from clients, preserving consistent output field names.
- Stored turn summaries/metrics now include only server-deployed tool outputs; client tool outputs are returned to callers.
Documentation
- New guide describing the hybrid tool execution model, merging header, conflict semantics, and streaming behavior.
Tests
- Expanded coverage for merging, conflict handling, filtering, and server/client output classification.

Introduce X-LCS-Merge-Server-Tools header that allows client-provided tools to be merged with server-configured tools (RAG, MCP) instead of replacing them. Tool conflicts (duplicate MCP server_label or duplicate file_search) are rejected with a 409 error. Add is_server_deployed_output() to distinguish LCS-deployed tools from client tools so only server tools are included in turn summaries, metrics, and storage. Client tool output items are still returned in the response. Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools, mcp_approval_request) from the SSE stream so OpenAI-compatible clients like Goose don't fail on unknown item types. Strip these items from the response.completed output array as well.

Document the Goose agent integration pattern where server-configured tools (RAG, MCP knowledge services) are merged with client-provided tools (oc/kubectl). Covers the X-LCS-Merge-Server-Tools header behavior, conflict detection, server-tool filtering in streamed responses, and an end-to-end conversation flow example.

coderabbitai · 2026-03-30T18:47:12Z

Walkthrough

Adds a header-driven hybrid tool merging model for /v1/responses, conflict detection returning HTTP 409, SSE filtering to hide server-deployed MCP events from clients, helper extraction for response finalization, and updates to tool resolution and turn-summary logic to distinguish server-deployed vs client-provided outputs.

Changes

Cohort / File(s)	Summary
Documentation `docs/goose-example-tool-merging.md`	New doc describing header-driven merging (`X-LCS-Merge-Server-Tools: true`), conflict rules for MCP/file_search, SSE filtering of server-deployed MCP events, and end-to-end flows.
Tool resolution & summaries `src/utils/responses.py`	Added `is_server_deployed_output()` and internal `_merge_tools()`; extended `resolve_tool_choice()` to optionally merge server-configured tools (honors merge header) and return HTTP 409 on conflicts; updated `build_turn_summary()` to exclude client-provided outputs from stored summaries/metrics.
Response streaming & finalization `src/app/endpoints/responses.py`	SSE serialization now uses `model_dump(..., by_alias=True)`; added server-MCP detection and chunk-filtering logic that suppresses server-deployed MCP substream events and strips server MCP items from streamed `response.output`; extracted end-of-stream logic into `_finalize_response()`.
Integration tests `tests/integration/endpoints/test_query_integration.py`	Adjusted mocked MCP `server_label` and updated assertions to reflect that only server-deployed tool calls appear in stored `response.tool_calls`.
Unit tests `tests/unit/app/endpoints/test_responses.py`, `tests/unit/utils/test_responses.py`	Added tests for server-MCP output detection and chunk filtering, `_merge_tools()` semantics including 409 conflict cases, `is_server_deployed_output()`, and `resolve_tool_choice()` merge behaviors and error propagation.

Sequence Diagrams

sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant ServerMCP as Server MCP
    participant ClientMCP as Client MCP

    Client->>Server: POST /v1/responses\ntools=[client_tools]\nX-LCS-Merge-Server-Tools: true
    Server->>ServerMCP: prepare_tools() (RAG + configured MCP)
    ServerMCP-->>Server: server_tools
    Note over Server: _merge_tools() checks conflicts:\n• MCP server_label collisions\n• file_search overlap
    alt Conflict Detected
        Server-->>Client: HTTP 409 Conflict
    else No Conflict
        Server->>Server: merged_tools = client_tools + server_tools
        Server-->>Client: Begin SSE streaming (merged tools available)
    end

sequenceDiagram
    participant Client
    participant Server as LCS Server
    participant MCP as Server-Deployed MCP
    participant Storage as Turn Summary & Storage

    Client->>Server: Request using merged tools
    Server->>MCP: Execute server-deployed tool calls
    MCP-->>Server: mcp_call / results
    Server->>Server: response_generator() streaming SSE
    Note over Server: Filter & hide server-deployed MCP item types\n(mcp_call, mcp_list_tools, mcp_approval_request)
    Server->>Client: SSE stream with server MCP events suppressed
    Server->>Server: _finalize_response()
    Note over Server: build_turn_summary() includes only server-deployed outputs
    Server->>Storage: Store summary/metrics with server tool outputs
    Server-->>Client: Complete

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding MCP tool merging support to the /v1/responses API with reference to the Goose agent use case.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 95-142: The fenced code block containing the conversation flow
diagram (the block starting with "Goose                     LCS /v1/responses   
Llama Stack        MCP/RAG" and the opening triple backticks) is missing a
language specifier; update the opening fence from just ``` to ```text (or
```plaintext) so the diagram is treated as plain text in rendering and
syntax-highlighting. Locate the triple-backtick that begins the diagram and add
the specifier, leaving the rest of the block unchanged.
- Around line 14-50: The fenced ASCII diagram block in
docs/goose-example-tool-merging.md is missing a language specifier which linters
flag; update the opening fence for the diagram (the triple-backtick that
currently starts the block) to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for
static analysis and rendering; ensure only the opening fence is changed and the
rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art)
remains untouched.

In `@src/utils/responses.py`:
- Around line 1126-1158: The is_server_deployed_output function currently treats
an MCPApprovalResponse (item_type "mcp_approval_response") as server-side
because it lacks a server_label; update is_server_deployed_output to explicitly
handle "mcp_approval_response" (e.g., return False to mark it as
client-provided) or implement lookup/propagation of server_label, and add a unit
test in TestIsServerDeployedOutput to cover the approval response case;
reference is_server_deployed_output and TestIsServerDeployedOutput when making
the change.

In `@tests/unit/utils/test_responses.py`:
- Around line 2773-2804: The test test_client_tools_with_merge_header only
asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools";
update the test (and the other affected tests around the same blocks) to also
call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools"
and assert the same merged-tools behavior (same tools, tool_choice, etc.) so
header normalization won't cause regressions; locate the calls to
resolve_tool_choice and duplicate or parametrize them to include both header key
variants to validate lowercase handling.
- Around line 2626-2635: Add a complementary positive test that verifies
mcp_approval_request is classified server-side when the server label matches:
create a new test (e.g., test_mcp_approval_request_server_side) that mocks
configuration (mocker.patch("utils.responses.configuration")) with
mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create
an item with item.type = "mcp_approval_request" and item.server_label =
"client-mcp", and assert is_server_deployed_output(item) is True; reference the
existing test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5204ab3-7d8c-490d-8111-8a64f71f83a1

📥 Commits

Reviewing files that changed from the base of the PR and between 1b68d36 and 20809d0.

📒 Files selected for processing (6)

docs/goose-example-tool-merging.md
src/app/endpoints/responses.py
src/utils/responses.py
tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
src/app/endpoints/responses.py
src/utils/responses.py
tests/unit/utils/test_responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/integration/endpoints/test_query_integration.py
tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

src/app/endpoints/responses.py
src/utils/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/responses.py

🧠 Learnings (2)

📚 Learning: 2026-02-25T07:46:39.608Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.

Applied to files:

src/utils/responses.py

📚 Learning: 2026-02-23T14:56:59.186Z

Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.

Applied to files:

src/utils/responses.py

🪛 markdownlint-cli2 (0.22.0)

docs/goose-example-tool-merging.md

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🔇 Additional comments (13)

src/utils/responses.py (3)

1193-1195: LGTM! Server-deployed output filtering in turn summary.

The filter correctly excludes client-provided tool outputs from the turn summary, ensuring only server-deployed tool calls and results are stored in metrics and conversation history.

1363-1411: LGTM! Tool merge logic with conflict detection.

The implementation correctly:

Detects MCP conflicts by comparing server_label values

Detects file_search conflicts when both client and server provide one

Returns merged tools with client tools first

Uses appropriate HTTP 409 status code for conflicts

Minor observation: Line 1411 wraps inputs in list() unnecessarily since client_tools and server_tools are already typed as list[InputTool], but this has no functional impact.

1449-1476: LGTM! Header-driven merge behavior.

The merge logic correctly:

Checks for X-LCS-Merge-Server-Tools: true header (case-insensitive)

Only merges when the header is explicitly set

Preserves original behavior when header is absent

Reuses the same vector_store_ids resolved from client tools when preparing server tools

tests/integration/endpoints/test_query_integration.py (2)

731-736: LGTM! Test expectations updated for server-deployed tool filtering.

The assertions correctly reflect the new behavior where:

Only server-deployed tool calls are included in response.tool_calls

function_call items (client-provided) are excluded

The expected count is reduced from 2 to 1

The comment clearly explains the reasoning.

635-635: No issues found: server1 is a valid MCP server name configured in test fixtures.

Verification confirms the server_label change to "server1" aligns with the MCP server configuration in the test fixtures (tests/integration/test_configuration.py lines 85-93 define server1 as a valid MCP server). The test setup correctly pairs this label with the mcp_list_tools item, and the assertions properly validate server-deployed tool classification.

tests/unit/app/endpoints/test_responses.py (1)

1378-1427: LGTM! Comprehensive tests for _is_server_mcp_output_item.

The test class provides good coverage including:

MCP types with matching and non-matching server labels

All three MCP event types (mcp_call, mcp_list_tools, mcp_approval_request)

Missing server_label handling

Non-MCP types returning False

Empty configured labels edge case

src/app/endpoints/responses.py (5)

405-407: LGTM! Added by_alias=True for SSE serialization.

Using by_alias=True ensures field names are serialized according to their aliases (e.g., created_at vs createdAt), maintaining consistency with the OpenAI Responses specification for SSE events.

Also applies to: 423-425, 435-435, 451-453

465-480: LGTM! Helper for filtering serialized MCP output items.

The function correctly identifies server-deployed MCP items from the serialized response.output array by checking:

Item type is one of the MCP types

server_label matches a configured MCP server

This complements is_server_deployed_output from utils/responses.py which operates on typed objects rather than dicts.

483-527: LGTM! Streaming chunk filter with output index tracking.

The implementation correctly:

Tracks output_index values when server MCP items are added

Filters intermediate events (response.mcp_call.*, response.mcp_list_tools.*) by tracked indices

Cleans up tracking when output_item.done is received using discard() (safe for missing keys)

Mutates server_mcp_output_indices as a side effect, which is documented via the function's purpose

530-575: LGTM! Extracted finalization logic into helper.

The _finalize_response helper:

Consolidates post-stream metadata extraction

Uses is_server_deployed_output to filter tool summaries (consistent with non-streaming path)

Handles conversation persistence with append_turn_items_to_conversation

This improves readability of response_generator by separating streaming from finalization concerns.

636-644: LGTM! Filtering server MCP items from response output.

The inline list comprehension correctly removes server-deployed MCP items from the output array in the serialized response, ensuring clients only receive item types they understand (message, function_call, etc.).

docs/goose-example-tool-merging.md (1)

1-92: LGTM! Clear documentation for hybrid tool architecture.

The documentation provides excellent coverage of:

The separation of knowledge (server) vs action (client) tools

Header-driven merge behavior with clear behavior table

Conflict detection semantics

SSE streaming filter mechanics with step-by-step explanation

Turn summary behavior for server vs client outputs

The architecture and flow diagrams effectively illustrate the concepts.

tests/unit/utils/test_responses.py (1)

2576-2738: Good coverage on classification and conflict behavior.

The new TestIsServerDeployedOutput and TestMergeTools cases are well-targeted and align with the merge/conflict contract (including 409 behavior).

coderabbitai · 2026-03-30T18:53:27Z

docs/goose-example-tool-merging.md

+```
+OpenShift Cluster
+┌────────────────────────────────────────────────────────────────────────────┐
+│                                                                            │
+│  namespace: foo-staging                                                    │
+│  ┌─────────────────────────────────────┐                                   │
+│  │       Goose Agent (Pod)             │                                   │
+│  │                                     │                                   │
+│  │  ServiceAccount: goose-readonly     │                                   │
+│  │  (read-only, limited k8s/OCP access)│                                   │
+│  │                                     │    HTTPS POST /v1/responses       │
+│  │  LLM reasoning loop                 │                                   │
+│  │  ├─ LCS /v1/responses ──────────────┼─────────────────────┐             │
+│  │  │  (knowledge queries)             │                     │             │
+│  │  │                                  │                     ▼             │
+│  │  ├─ Local MCP tools                 │  namespace: lightspeed-stack      │
+│  │  │  (cluster read-only operations)  │  ┌─────────────────────────┐      │
+│  │  │  ├─ oc get pods                  │  │  Lightspeed Stack (LCS) │      │
+│  │  │  ├─ oc logs                      │  │                         │      │
+│  │  │  └─ oc describe                  │  │  Llama Stack Engine     │      │
+│  │  └──────────────────────────────────┘  │  ├─ file_search (RAG)   │      │
+│  └──────────┬───────────────────────────  │  ├─ mcp: OKP/Solr       │      │
+│             │ ServiceAccount token        │  ├─ mcp: errata         │      │
+│             │ (read-only RBAC)            │  └─ mcp: bugzilla       │      │
+│             │                             └───────────┬─────────────┘      │
+│             │                                   ┌─────┴──────┐             │
+│             │                                   │FAISS / Solr│             │
+│             │                                   └────────────┘             │
+│             │                                                              │
+│             │  namespace: foo-staging                                      │
+│             │  ┌───────────────────────────┐                               │
+│             └─▶│  pods, deployments, ...   │                               │
+│                │  (read-only RBAC enforced)│                               │
+│                └───────────────────────────┘                               │
+│                                                                            │
+└────────────────────────────────────────────────────────────────────────────┘
+```


🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Static analysis flagged this code block as missing a language specifier. For ASCII diagrams, use text or plaintext to satisfy linters and improve rendering hints.

-``` +```text OpenShift Cluster

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 14-14: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/goose-example-tool-merging.md` around lines 14 - 50, The fenced ASCII diagram block in docs/goose-example-tool-merging.md is missing a language specifier which linters flag; update the opening fence for the diagram (the triple-backtick that currently starts the block) to include a language token such as text or plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for static analysis and rendering; ensure only the opening fence is changed and the rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art) remains untouched.

coderabbitai · 2026-03-30T18:53:27Z

docs/goose-example-tool-merging.md

+```
+Goose                     LCS /v1/responses        Llama Stack        MCP/RAG
+  │                             │                       │                │
+  │  "pod api-7f8b9 is crash-   │                       │                │
+  │   looping in foo-staging,   │                       │                │
+  │   help me diagnose it"      │                       │                │
+  │                             │                       │                │
+  │  POST {input, tools,        │                       │                │
+  │   X-LCS-Merge-Server-Tools} │                       │                │
+  │────────────────────────────▶│                       │                │
+  │                             │ merge client +        │                │
+  │                             │ server tools          │                │
+  │                             │                       │                │
+  │                             │ responses.create()    │                │
+  │                             │──────────────────────▶│                │
+  │                             │                       │ LLM: search RAG│
+  │                             │                       │───────────────▶│
+  │                             │                       │◀── RAG chunks: │
+  │                             │                       │  known OOM fix │
+  │                             │                       │  for api image │
+  │                             │                       │                │
+  │                             │                       │ LLM: query OKP │
+  │                             │                       │───────────────▶│
+  │                             │                       │◀── KB article: │
+  │                             │                       │  memory limits │
+  │                             │                       │  best practice │
+  │                             │                       │                │
+  │                             │◀──────────────────────│ answer with    │
+  │                             │                       │ RAG + KB ctx   │
+  │  SSE: response.completed    │                       │                │
+  │  (server MCP items filtered)│                       │                │
+  │◀────────────────────────────│                       │                │
+  │                             │                       │                │
+  │  Goose combines LCS knowledge with live cluster data:                │
+  │                             │                       │                │
+  │──▶ LOCAL MCP: oc get pods -n foo-staging                             │
+  │◀── pod/api-7f8b9  CrashLoopBackOff                                   │
+  │──▶ LOCAL MCP: oc logs pod/api-7f8b9                                  │
+  │◀── OOMKilled                                                         │
+  │──▶ LOCAL MCP: oc describe pod/api-7f8b9                              │
+  │◀── events, resource limits                                           │
+  │                                                                      │
+  │  Goose correlates live cluster state with RAG/OKP                    │
+  │  knowledge to produce a diagnosis:                                   │
+  │  "Pod is OOMKilled. RAG docs confirm a known fix                     │
+  │   for this image -- raise memory limit to 512Mi.                     │
+  │   See KB article for memory tuning best practices."                  │
+```


🧹 Nitpick | 🔵 Trivial

Add language specifier to fenced code block.

Same as above, add text or plaintext language specifier for the conversation flow diagram.

-``` +```text Goose LCS /v1/responses Llama Stack MCP/RAG

🧰 Tools

🪛 markdownlint-cli2 (0.22.0)

[warning] 95-95: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/goose-example-tool-merging.md` around lines 95 - 142, The fenced code block containing the conversation flow diagram (the block starting with "Goose LCS /v1/responses Llama Stack MCP/RAG" and the opening triple backticks) is missing a language specifier; update the opening fence from just ``` to ```text (or ```plaintext) so the diagram is treated as plain text in rendering and syntax-highlighting. Locate the triple-backtick that begins the diagram and add the specifier, leaving the rest of the block unchanged.

coderabbitai · 2026-03-30T18:53:27Z

src/utils/responses.py

+def is_server_deployed_output(output_item: ResponseOutput) -> bool:
+    """Check if a response output item belongs to a tool deployed by LCS.
+
+    In the hybrid architecture clients may provide their own tools (function
+    tools or MCP servers running locally) alongside server-configured tools.
+    This function identifies items that belong to LCS-deployed tools so that
+    only those are included in server-side processing (turn summary, metrics,
+    storage).  Client tool output items are still returned in the response
+    to the caller but are not processed internally.
+
+    Args:
+        output_item: A ResponseOutput item from the response.
+
+    Returns:
+        True if the item should be processed by LCS, False for client tools.
+    """
+    item_type = getattr(output_item, "type", None)
+
+    # function_call items are always from client-provided tools;
+    # LCS only configures file_search and mcp tools.
+    if item_type == "function_call":
+        return False
+
+    # MCP items: check server_label against configured servers
+    if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"):
+        server_label = getattr(output_item, "server_label", None)
+        if server_label is not None:
+            configured_labels = {s.name for s in configuration.mcp_servers}
+            return server_label in configured_labels
+
+    # file_search_call, web_search_call, message, and unknown types
+    # are treated as server-side.
+    return True


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if mcp_approval_response has server_label attribute in the llama_stack_api types rg -n "mcp_approval_response" --type=py -C5

Repository: lightspeed-core/lightspeed-stack

Length of output: 7880

🏁 Script executed:

#!/bin/bash # Find MCPApprovalResponse definition and where approval responses are created rg -n "class MCPApprovalResponse|MCPApprovalResponse.*=" --type=py -A3 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Search for where mcp_approval_response items originate/are constructed rg -n "MCPApprovalResponse\(" --type=py -B2 -A2

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Check if MCPApprovalResponse has server_label attribute rg -n "approval_request_id|server_label" src/utils/responses.py | grep -A2 -B2 "820\|821\|822"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Find MCPApprovalResponse import and usage rg -n "MCPApprovalResponse" --type=py | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 732

🏁 Script executed:

#!/bin/bash # Look at the build_tool_call_summary function more closely to understand MCPApprovalResponse structure sed -n '815,835p' src/utils/responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 774

🏁 Script executed:

#!/bin/bash # Check all response output types handled in is_server_deployed_output context rg -n "type.*=.*\"mcp" --type=py | head -30

Repository: lightspeed-core/lightspeed-stack

Length of output: 2390

🏁 Script executed:

#!/bin/bash # Check imports at top of responses.py to understand MCPApprovalResponse head -50 src/utils/responses.py | grep -A5 -B5 "MCPApprovalResponse"

Repository: lightspeed-core/lightspeed-stack

Length of output: 439

🏁 Script executed:

#!/bin/bash # Search for any mentions of server_label in approval context rg -n "server_label" src/utils/responses.py | grep -B3 -A3 "approval"

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Check test for is_server_deployed_output to see if mcp_approval_response is tested rg -n "is_server_deployed_output" --type=py -B2 -A10

Repository: lightspeed-core/lightspeed-stack

Length of output: 12041

🏁 Script executed:

#!/bin/bash # Look for the full test class to see if there's a test for mcp_approval_response sed -n '2576,2661p' tests/unit/utils/test_responses.py

Repository: lightspeed-core/lightspeed-stack

Length of output: 3657

🏁 Script executed:

#!/bin/bash # Check the MCPApprovalResponse type to see what attributes it has rg -n "class.*MCPApprovalResponse|@dataclass.*approval" --type=py -A15 | head -50

Repository: lightspeed-core/lightspeed-stack

Length of output: 58

🏁 Script executed:

#!/bin/bash # Check if approval_request_id can be used to track origin of approval responses rg -n "approval_request_id" src/utils/responses.py -B3 -A3

Repository: lightspeed-core/lightspeed-stack

Length of output: 410

🏁 Script executed:

#!/bin/bash # Check if there's a way to track which approval_request originated from which server rg -n "approval_request_id" src/utils/ -A2 -B2 | head -80

Repository: lightspeed-core/lightspeed-stack

Length of output: 856

🏁 Script executed:

#!/bin/bash # Look for where mcp_approval_response items are created or generated to see if there's origin tracking rg -n "mcp_approval_response" src/ --type=py | grep -v test | grep -v ".pyc"

Repository: lightspeed-core/lightspeed-stack

Length of output: 373

🏁 Script executed:

Aboriginal # Check if there's any validation or origin tracking for approval requests/responses rg -n "MCPApprovalRequest|MCPApprovalResponse" src/ --type=py -A5 | grep -E "server_label|origin|source" | head -20

Repository: lightspeed-core/lightspeed-stack

Length of output: 124

Add explicit handling for mcp_approval_response type or document design assumption.

The function checks mcp_approval_request but not mcp_approval_response. Since MCPApprovalResponse (imported from llama_stack_api.openai_responses) lacks a server_label attribute and only carries approval_request_id, this type falls through to the default return True, classifying all approval responses as server-deployed regardless of origin.

If client-provided MCP servers can emit approval responses, they would be incorrectly included in server-side processing (turn summary, storage) due to the missing origin check. Either:

Add server_label tracking to approval responses so they can be classified by origin, or

Document that approval responses are always treated as server-side by design (with rationale), or

Add test coverage for this scenario to TestIsServerDeployedOutput

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/responses.py` around lines 1126 - 1158, The is_server_deployed_output function currently treats an MCPApprovalResponse (item_type "mcp_approval_response") as server-side because it lacks a server_label; update is_server_deployed_output to explicitly handle "mcp_approval_response" (e.g., return False to mark it as client-provided) or implement lookup/propagation of server_label, and add a unit test in TestIsServerDeployedOutput to cover the approval response case; reference is_server_deployed_output and TestIsServerDeployedOutput when making the change.

coderabbitai · 2026-03-30T18:53:27Z

tests/unit/utils/test_responses.py

+    def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
+        """Test mcp_approval_request with unmatched label is client-side."""
+        mock_config = mocker.Mock()
+        mock_config.mcp_servers = []
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "client-mcp"
+        assert is_server_deployed_output(item) is False


🧹 Nitpick | 🔵 Trivial

Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. Please add the matching-label path too, since this type is part of the server-event filtering contract.

Proposed test addition

class TestIsServerDeployedOutput: @@ def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None: """Test mcp_approval_request with unmatched label is client-side.""" @@ assert is_server_deployed_output(item) is False + + def test_mcp_approval_request_server_side_when_label_matches( + self, mocker: MockerFixture + ) -> None: + """Test mcp_approval_request with matching label is server-deployed.""" + mock_config = mocker.Mock() + mock_server = mocker.Mock() + mock_server.name = "server-mcp" + mock_config.mcp_servers = [mock_server] + mocker.patch("utils.responses.configuration", mock_config) + + item = mocker.Mock() + item.type = "mcp_approval_request" + item.server_label = "server-mcp" + assert is_server_deployed_output(item) is True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2626 - 2635, Add a complementary positive test that verifies mcp_approval_request is classified server-side when the server label matches: create a new test (e.g., test_mcp_approval_request_server_side) that mocks configuration (mocker.patch("utils.responses.configuration")) with mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create an item with item.type = "mcp_approval_request" and item.server_label = "client-mcp", and assert is_server_deployed_output(item) is True; reference the existing test_mcp_approval_request_client_side, the configuration mock, and the is_server_deployed_output function to locate where to add this test.

coderabbitai · 2026-03-30T18:53:27Z

tests/unit/utils/test_responses.py

+    async def test_client_tools_with_merge_header(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test client tools merged with server tools when header is set."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        mock_config = mocker.Mock()
+        mock_config.configuration.byok_rag = []
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        server_mcp = InputToolMCP(
+            server_label="server-tool", server_url="http://server"
+        )
+        mocker.patch(
+            "utils.responses.prepare_tools",
+            new=mocker.AsyncMock(return_value=[server_mcp]),
+        )
+
+        client_tool = InputToolMCP(
+            server_label="client-tool", server_url="http://client"
+        )
+        tools, tool_choice, _ = await resolve_tool_choice(
+            tools=[client_tool],
+            tool_choice=None,
+            token="tok",
+            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+        )


⚠️ Potential issue | 🟡 Minor

Header-merge tests should also validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (Line 2803/2841/2899). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.

Proposed hardening for header-key coverage

class TestResolveToolChoice: @@ - async def test_client_tools_with_merge_header( + `@pytest.mark.parametrize`( + "merge_header_key", + ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"], + ) + async def test_client_tools_with_merge_header( self, mocker: MockerFixture + , merge_header_key: str ) -> None: @@ tools, tool_choice, _ = await resolve_tool_choice( tools=[client_tool], tool_choice=None, token="tok", - request_headers={"X-LCS-Merge-Server-Tools": "true"}, + request_headers={merge_header_key: "true"}, )

Also applies to: 2836-2842, 2895-2900

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2773 - 2804, The test test_client_tools_with_merge_header only asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools"; update the test (and the other affected tests around the same blocks) to also call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools" and assert the same merged-tools behavior (same tools, tool_choice, etc.) so header normalization won't cause regressions; locate the calls to resolve_tool_choice and duplicate or parametrize them to include both header key variants to validate lowercase handling.

tisnik · 2026-03-30T19:26:40Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review ToolChoice none ignored The `no_tools` flag (when `tool_choice` is none) is computed only in the branch where client tools are provided, but not passed to `prepare_tools` in the else branch. As a result, requests with no tools and `tool_choice: "none"` will still load server tools instead of disabling them. mcp_headers: Optional[McpHeaders] = None, request_headers: Optional[Mapping[str, str]] = None, ) -> tuple[Optional[list[InputTool]], Optional[ToolChoice], Optional[list[str]]]: """Resolve tools and tool_choice for the Responses API. When the request includes tools and the ``X-LCS-Merge-Server-Tools: true`` header is set, client-provided tools are merged with server-configured tools (RAG, MCP). Conflicts (e.g. a client MCP tool with the same server_label as a server-configured one, or duplicate file_search tools) are rejected with a 409 error. Without the header, client-provided tools are used as-is (original behavior). When no tools are provided in the request, all LCORE-configured tools are loaded via prepare_tools. The tool_choice "none" flag disables all tools. Args: tools: Tools from the request, or None to use LCORE-configured tools. tool_choice: Requested tool choice (e.g. auto, required, none) or None. token: User token for MCP/auth. mcp_headers: Optional MCP headers to propagate. request_headers: Optional request headers for tool resolution. Returns: A tuple of (prepared_tools, prepared_tool_choice, vector_store_ids): prepared_tools is the list of tools to use, or None if none configured; prepared_tool_choice is the resolved tool choice, or None when there are no tools; vector_store_ids is extracted from tools (in user-facing format) when provided, otherwise None. """ prepared_tools: Optional[list[InputTool]] = None client = AsyncLlamaStackClientHolder().get_client() no_tools = ( isinstance(tool_choice, ToolChoiceMode) and tool_choice == ToolChoiceMode.none ) merge_server_tools = ( request_headers is not None and request_headers.get("X-LCS-Merge-Server-Tools", "").lower() == "true" ) if tools: # explicitly specified in request # Per-request override of vector stores (user-facing rag_ids) vector_store_ids = extract_vector_store_ids_from_tools(tools) # Translate user-facing rag_ids to llama-stack vector_store_ids in each file_search tool byok_rags = configuration.configuration.byok_rag prepared_tools = translate_tools_vector_store_ids(tools, byok_rags) # Optionally merge server-configured tools (RAG, MCP) with client tools if merge_server_tools: server_tools = await prepare_tools( client=client, vector_store_ids=vector_store_ids, no_tools=no_tools, token=token, mcp_headers=mcp_headers, request_headers=request_headers, ) if server_tools: prepared_tools = _merge_tools(prepared_tools, server_tools) Incomplete MCP filtering The `_should_filter_mcp_chunk` helper drops events for `mcp_call` and `mcp_list_tools` substreams but does not filter `response.mcp_approval_request.*` events. This may leak server-side approval request chunks to clients that cannot handle them. def _should_filter_mcp_chunk( chunk: OpenAIResponseObjectStream, event_type: Optional[str], configured_mcp_labels: set[str], server_mcp_output_indices: set[int], ) -> bool: """Check if a streaming chunk is a server-deployed MCP event that should be filtered. Args: chunk: The streaming chunk to check. event_type: The event type of the chunk. configured_mcp_labels: Set of server_label names configured in LCS. server_mcp_output_indices: Tracked output indices of server-deployed MCP calls. Returns: True if the chunk should be filtered out from the client stream. """ if event_type == "response.output_item.added": item_added_chunk = cast(OutputItemAddedChunk, chunk) item = item_added_chunk.item item_type = getattr(item, "type", None) if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): server_label = getattr(item, "server_label", None) if server_label in configured_mcp_labels: server_mcp_output_indices.add(item_added_chunk.output_index) return True if event_type and ( event_type.startswith("response.mcp_call.") or event_type.startswith("response.mcp_list_tools.") ): output_index = getattr(chunk, "output_index", None) if output_index in server_mcp_output_indices: return True if event_type == "response.output_item.done": item_done_chunk = cast(OutputItemDoneChunk, chunk) item = item_done_chunk.item item_type = getattr(item, "type", None) if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): if item_done_chunk.output_index in server_mcp_output_indices: server_mcp_output_indices.discard(item_done_chunk.output_index) return True return False

Cover the new functions introduced in the server-tool merging feature: - is_server_deployed_output: classify output items as server vs client - _merge_tools: merge client and server tools with conflict detection - resolve_tool_choice: X-LCS-Merge-Server-Tools header integration - _is_server_mcp_output_item: filter serialized MCP items in streaming 29 new tests across utils/test_responses.py and endpoints/test_responses.py validating correct behavior for MCP label matching, 409 conflict rejection, file_search deduplication, and pass-through of non-MCP item types.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 1378-1427: Add a regression test ensuring approval-request
substream events are filtered from client streams: in TestIsServerMcpOutputItem
add a new test (e.g., test_mcp_approval_request_stream_filtering) that simulates
a streaming chunk key starting with "response.mcp_approval_request." and asserts
the streaming filter logic rejects/filters that chunk so approval-request events
cannot leak to clients; reference the mcp approval path by using the
"response.mcp_approval_request.*" key and the existing helper
_is_server_mcp_output_item (and whatever streaming chunk filter function is used
in the responses code) to validate the chunk is filtered when appropriate.

In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a new unit test that mirrors
test_no_tools_uses_prepare_tools but calls resolve_tool_choice with
tool_choice="none" (and tools=None) and asserts that the patched prepare_tools
(utils.responses.prepare_tools) is invoked with no_tools=True; locate the test
around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d826784-c522-4d69-a050-222bc472dc2e

📥 Commits

Reviewing files that changed from the base of the PR and between 20809d0 and f02648d.

📒 Files selected for processing (2)

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: server mode / ci
GitHub Check: E2E: library mode / ci

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py
tests/unit/utils/test_responses.py

🔇 Additional comments (3)

tests/unit/app/endpoints/test_responses.py (1)

15-20: Import addition is clean and consistent.

The new _is_server_mcp_output_item import (Line 16) fits the existing absolute internal import pattern and keeps the test module scope focused.
tests/unit/utils/test_responses.py (2)
2624-2633: Add a positive mcp_approval_request server-side classification test.

Line 2626 only validates the unmatched-label/client-side path. A matching-label test is needed since mcp_approval_request is part of the server-event filtering contract (lines 2585-2596 test both paths for mcp_call, lines 2611-2622 for mcp_list_tools).
Proposed test addition
 class TestIsServerDeployedOutput:
@@
     def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
         """Test mcp_approval_request with unmatched label is client-side."""
@@
         assert is_server_deployed_output(item) is False
+
+    def test_mcp_approval_request_server_side_when_label_matches(
+        self, mocker: MockerFixture
+    ) -> None:
+        """Test mcp_approval_request with matching label is server-deployed."""
+        mock_config = mocker.Mock()
+        mock_server = mocker.Mock()
+        mock_server.name = "server-mcp"
+        mock_config.mcp_servers = [mock_server]
+        mocker.patch("utils.responses.configuration", mock_config)
+
+        item = mocker.Mock()
+        item.type = "mcp_approval_request"
+        item.server_label = "server-mcp"
+        assert is_server_deployed_output(item) is True
2769-2802: Header-merge tests should validate lowercase header keys.

Current assertions only use X-LCS-Merge-Server-Tools casing (lines 2797, 2835, 2889). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.
Proposed hardening for header-key coverage
 class TestResolveToolChoice:
@@
-    async def test_client_tools_with_merge_header(
+    `@pytest.mark.parametrize`(
+        "merge_header_key",
+        ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+    )
+    async def test_client_tools_with_merge_header(
         self, mocker: MockerFixture
+        , merge_header_key: str
     ) -> None:
@@
         tools, tool_choice, _ = await resolve_tool_choice(
             tools=[client_tool],
             tool_choice=None,
             token="tok",
-            request_headers={"X-LCS-Merge-Server-Tools": "true"},
+            request_headers={merge_header_key: "true"},
         )
Apply similar parametrization to test_merge_header_conflict_raises_409 and test_merge_header_no_server_tools_returns_client_only.

tests/unit/app/endpoints/test_responses.py

coderabbitai · 2026-03-30T21:53:23Z

tests/unit/utils/test_responses.py

+    async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
+        """Test that no client tools falls through to prepare_tools."""
+        mock_client = mocker.AsyncMock()
+        mock_holder = mocker.Mock()
+        mock_holder.get_client.return_value = mock_client
+        mocker.patch(
+            "utils.responses.AsyncLlamaStackClientHolder",
+            return_value=mock_holder,
+        )
+        server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])
+        mock_prepare = mocker.AsyncMock(return_value=[server_tool])
+        mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+        tools, _, _ = await resolve_tool_choice(
+            tools=None,
+            tool_choice=None,
+            token="tok",
+        )
+        assert tools is not None
+        assert len(tools) == 1
+        mock_prepare.assert_called_once()


🛠️ Refactor suggestion | 🟠 Major

Add test coverage for tool_choice="none" scenario.

The test only validates tool_choice=None (line 2855). Per the PR objectives, there's a known issue where tool_choice="none" may not correctly disable tools when tools=None. Add a test asserting that prepare_tools is called with no_tools=True when tool_choice="none".

Proposed test addition

`@pytest.mark.asyncio` async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None: """Test that no client tools falls through to prepare_tools.""" @@ assert len(tools) == 1 mock_prepare.assert_called_once() + + `@pytest.mark.asyncio` + async def test_tool_choice_none_passes_no_tools_flag( + self, mocker: MockerFixture + ) -> None: + """Test that tool_choice='none' sets no_tools=True in prepare_tools.""" + mock_client = mocker.AsyncMock() + mock_holder = mocker.Mock() + mock_holder.get_client.return_value = mock_client + mocker.patch( + "utils.responses.AsyncLlamaStackClientHolder", + return_value=mock_holder, + ) + mock_prepare = mocker.AsyncMock(return_value=None) + mocker.patch("utils.responses.prepare_tools", new=mock_prepare) + + tools, _, _ = await resolve_tool_choice( + tools=None, + tool_choice="none", + token="tok", + ) + assert tools is None + mock_prepare.assert_called_once() + # Verify no_tools=True was passed + call_kwargs = mock_prepare.call_args.kwargs + assert call_kwargs.get("no_tools") is True

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:

"""Test that no client tools falls through to prepare_tools."""

mock_client = mocker.AsyncMock()

mock_holder = mocker.Mock()

mock_holder.get_client.return_value = mock_client

mocker.patch(

"utils.responses.AsyncLlamaStackClientHolder",

return_value=mock_holder,

)

server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])

mock_prepare = mocker.AsyncMock(return_value=[server_tool])

mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(

tools=None,

tool_choice=None,

token="tok",

)

assert tools is not None

assert len(tools) == 1

mock_prepare.assert_called_once()

async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:

"""Test that no client tools falls through to prepare_tools."""

mock_client = mocker.AsyncMock()

mock_holder = mocker.Mock()

mock_holder.get_client.return_value = mock_client

mocker.patch(

"utils.responses.AsyncLlamaStackClientHolder",

return_value=mock_holder,

)

server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"])

mock_prepare = mocker.AsyncMock(return_value=[server_tool])

mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(

tools=None,

tool_choice=None,

token="tok",

)

assert tools is not None

assert len(tools) == 1

mock_prepare.assert_called_once()

`@pytest.mark.asyncio`

async def test_tool_choice_none_passes_no_tools_flag(

self, mocker: MockerFixture

) -> None:

"""Test that tool_choice='none' sets no_tools=True in prepare_tools."""

mock_client = mocker.AsyncMock()

mock_holder = mocker.Mock()

mock_holder.get_client.return_value = mock_client

mocker.patch(

"utils.responses.AsyncLlamaStackClientHolder",

return_value=mock_holder,

)

mock_prepare = mocker.AsyncMock(return_value=None)

mocker.patch("utils.responses.prepare_tools", new=mock_prepare)

tools, _, _ = await resolve_tool_choice(

tools=None,

tool_choice="none",

token="tok",

)

assert tools is None

mock_prepare.assert_called_once()

# Verify no_tools=True was passed

call_kwargs = mock_prepare.call_args.kwargs

assert call_kwargs.get("no_tools") is True

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a new unit test that mirrors test_no_tools_uses_prepare_tools but calls resolve_tool_choice with tool_choice="none" (and tools=None) and asserts that the patched prepare_tools (utils.responses.prepare_tools) is invoked with no_tools=True; locate the test around test_no_tools_uses_prepare_tools and use the same AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or equivalent) to ensure prepare_tools receives the no_tools flag when resolve_tool_choice(tool_choice="none") is used.

…lter_mcp_chunk The helper already handled mcp_approval_request items in the output_item.added and output_item.done branches, but the substream event prefix check only matched response.mcp_call.* and response.mcp_list_tools.*, allowing server-side approval request progress events to leak to clients that cannot handle them. Add the missing response.mcp_approval_request.* prefix check and add unit tests for _should_filter_mcp_chunk covering all three substream event types.

dprince · 2026-03-30T22:33:22Z

@tisnik thanks for the review. I added a commit to address the 'incomplete mcp filtering'. For the 'no_tools' comment I looked and think the no_tools variable is scoped to the entire function and is passed into both prepare_tools calls. let me know if I am missing something

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 556-576: The persisted turn still contains client-side tool items
because append_turn_items_to_conversation is called with
latest_response_object.output; before calling append_turn_items_to_conversation
(after building turn_summary and rag_chunks) filter
latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7a365396-e85c-42bc-8ec0-e4067d6ace16

📥 Commits

Reviewing files that changed from the base of the PR and between f02648d and 2d6fea6.

📒 Files selected for processing (2)

src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: E2E Tests for Lightspeed Evaluation job
GitHub Check: E2E: library mode / ci
GitHub Check: E2E: server mode / ci

🧰 Additional context used

📓 Path-based instructions (5)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use absolute imports for internal modules from the Lightspeed Core Stack (e.g., from authentication import get_auth_dependency)

Files:

src/app/endpoints/responses.py
tests/unit/app/endpoints/test_responses.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use FastAPI imports from the fastapi module: from fastapi import APIRouter, HTTPException, Request, status, Depends
Use from llama_stack_client import AsyncLlamaStackClient for Llama Stack client imports
Check constants.py for shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Use logger = get_logger(__name__) from log.py for module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntax str | int instead of Union[str, int] for type annotations
Use Optional[Type] for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Use async def for functions performing I/O operations and external API calls
Handle APIConnectionError from Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with @abstractmethod decorators for interface implementations
Use complete type annotations for all class attributes; avoid using Any type
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed

Files:

src/app/endpoints/responses.py

src/app/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use FastAPI HTTPException with appropriate status codes for API endpoint error handling

Files:

src/app/endpoints/responses.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests

Files:

tests/unit/app/endpoints/test_responses.py

tests/unit/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest-mock for AsyncMock objects in unit tests

Files:

tests/unit/app/endpoints/test_responses.py

🔇 Additional comments (2)

tests/unit/app/endpoints/test_responses.py (1)

1466-1498: Good regression coverage for approval-request substreams.

This closes the response.mcp_approval_request.* hole that previously let unknown MCP events leak to OpenAI-compatible clients.

src/app/endpoints/responses.py (1)

531-537: The review comment is incorrect. Both code paths in update_azure_token() properly update the AsyncLlamaStackClientHolder's internal client:

Library client mode (update_azure_token line 17): Calls reload_library_client(), which sets self._lsc = client (client.py:119)

Service client mode (update_azure_token line 33): Calls update_provider_data(), which does self._lsc = self._lsc.copy(set_default_headers=updated_headers) (client.py:54)

Since AsyncLlamaStackClientHolder is a Singleton, when _finalize_response() calls get_client() at line 572, it retrieves the same holder instance that was just updated by update_azure_token(). The client will have fresh credentials—there is no staleness issue.

coderabbitai · 2026-03-30T22:40:13Z

src/app/endpoints/responses.py

+        for item in latest_response_object.output:
+            if not is_server_deployed_output(item):
+                continue
+            tool_call, tool_result = build_tool_call_summary(item)
+            if tool_call:
+                turn_summary.tool_calls.append(tool_call)
+            if tool_result:
+                turn_summary.tool_results.append(tool_result)
+
+        tool_rag_chunks = parse_rag_chunks(
+            latest_response_object,
+            vector_store_ids,
+            configuration.rag_id_mapping,
+        )
+        turn_summary.rag_chunks = inline_rag_context.rag_chunks + tool_rag_chunks
+
+    client = AsyncLlamaStackClientHolder().get_client()
+    if api_params.store and api_params.previous_response_id and latest_response_object:
+        await append_turn_items_to_conversation(
+            client, api_params.conversation, user_input, latest_response_object.output
+        )


⚠️ Potential issue | 🟠 Major

Filter client tool items before persisting the streamed turn.

Line 557 already gates internal processing with is_server_deployed_output(), but append_turn_items_to_conversation() on Lines 573-576 still receives the full latest_response_object.output. src/utils/conversations.py:454-497 stores every llm_output item it gets, so client-local MCP/function items are still persisted on the streaming path.

Proposed fix

client = AsyncLlamaStackClientHolder().get_client() if api_params.store and api_params.previous_response_id and latest_response_object: + stored_output = [ + item + for item in latest_response_object.output + if is_server_deployed_output(item) + ] await append_turn_items_to_conversation( - client, api_params.conversation, user_input, latest_response_object.output + client, api_params.conversation, user_input, stored_output )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/app/endpoints/responses.py` around lines 556 - 576, The persisted turn still contains client-side tool items because append_turn_items_to_conversation is called with latest_response_object.output; before calling append_turn_items_to_conversation (after building turn_summary and rag_chunks) filter latest_response_object.output to only include items that pass is_server_deployed_output() (same predicate used earlier), and pass that filtered list to append_turn_items_to_conversation so client-only MCP/function outputs are not stored; reference latest_response_object.output, is_server_deployed_output(), and append_turn_items_to_conversation() when making the change.

dprince added 2 commits March 30, 2026 14:33

dprince changed the title ~~/v1/responses support for MCP tool merging (goose agent support)~~ LCORE-1616 /v1/responses support for MCP tool merging (goose agent support) Mar 30, 2026

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

tisnik added the Review effort 3/5 label Mar 30, 2026

dprince force-pushed the merge_server_tools branch from 20809d0 to f02648d Compare March 30, 2026 21:48

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

Conversation

dprince commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

tisnik commented Mar 30, 2026

PR Reviewer Guide 🔍

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

dprince commented Mar 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dprince commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading