LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)#1435
LCORE-1616 /v1/responses support for MCP tool merging (goose agent support)#1435dprince wants to merge 4 commits intolightspeed-core:mainfrom
Conversation
Introduce X-LCS-Merge-Server-Tools header that allows client-provided tools to be merged with server-configured tools (RAG, MCP) instead of replacing them. Tool conflicts (duplicate MCP server_label or duplicate file_search) are rejected with a 409 error. Add is_server_deployed_output() to distinguish LCS-deployed tools from client tools so only server tools are included in turn summaries, metrics, and storage. Client tool output items are still returned in the response. Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools, mcp_approval_request) from the SSE stream so OpenAI-compatible clients like Goose don't fail on unknown item types. Strip these items from the response.completed output array as well.
Document the Goose agent integration pattern where server-configured tools (RAG, MCP knowledge services) are merged with client-provided tools (oc/kubectl). Covers the X-LCS-Merge-Server-Tools header behavior, conflict detection, server-tool filtering in streamed responses, and an end-to-end conversation flow example.
WalkthroughAdds a header-driven hybrid tool merging model for /v1/responses, conflict detection returning HTTP 409, SSE filtering to hide server-deployed MCP events from clients, helper extraction for response finalization, and updates to tool resolution and turn-summary logic to distinguish server-deployed vs client-provided outputs. Changes
Sequence DiagramssequenceDiagram
participant Client
participant Server as LCS Server
participant ServerMCP as Server MCP
participant ClientMCP as Client MCP
Client->>Server: POST /v1/responses\ntools=[client_tools]\nX-LCS-Merge-Server-Tools: true
Server->>ServerMCP: prepare_tools() (RAG + configured MCP)
ServerMCP-->>Server: server_tools
Note over Server: _merge_tools() checks conflicts:\n• MCP server_label collisions\n• file_search overlap
alt Conflict Detected
Server-->>Client: HTTP 409 Conflict
else No Conflict
Server->>Server: merged_tools = client_tools + server_tools
Server-->>Client: Begin SSE streaming (merged tools available)
end
sequenceDiagram
participant Client
participant Server as LCS Server
participant MCP as Server-Deployed MCP
participant Storage as Turn Summary & Storage
Client->>Server: Request using merged tools
Server->>MCP: Execute server-deployed tool calls
MCP-->>Server: mcp_call / results
Server->>Server: response_generator() streaming SSE
Note over Server: Filter & hide server-deployed MCP item types\n(mcp_call, mcp_list_tools, mcp_approval_request)
Server->>Client: SSE stream with server MCP events suppressed
Server->>Server: _finalize_response()
Note over Server: build_turn_summary() includes only server-deployed outputs
Server->>Storage: Store summary/metrics with server tool outputs
Server-->>Client: Complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/goose-example-tool-merging.md`:
- Around line 95-142: The fenced code block containing the conversation flow
diagram (the block starting with "Goose LCS /v1/responses
Llama Stack MCP/RAG" and the opening triple backticks) is missing a
language specifier; update the opening fence from just ``` to ```text (or
```plaintext) so the diagram is treated as plain text in rendering and
syntax-highlighting. Locate the triple-backtick that begins the diagram and add
the specifier, leaving the rest of the block unchanged.
- Around line 14-50: The fenced ASCII diagram block in
docs/goose-example-tool-merging.md is missing a language specifier which linters
flag; update the opening fence for the diagram (the triple-backtick that
currently starts the block) to include a language token such as text or
plaintext (e.g., change ``` to ```text) so the ASCII diagram is annotated for
static analysis and rendering; ensure only the opening fence is changed and the
rest of the Goose Agent / OpenShift Cluster diagram content (the ASCII art)
remains untouched.
In `@src/utils/responses.py`:
- Around line 1126-1158: The is_server_deployed_output function currently treats
an MCPApprovalResponse (item_type "mcp_approval_response") as server-side
because it lacks a server_label; update is_server_deployed_output to explicitly
handle "mcp_approval_response" (e.g., return False to mark it as
client-provided) or implement lookup/propagation of server_label, and add a unit
test in TestIsServerDeployedOutput to cover the approval response case;
reference is_server_deployed_output and TestIsServerDeployedOutput when making
the change.
In `@tests/unit/utils/test_responses.py`:
- Around line 2773-2804: The test test_client_tools_with_merge_header only
asserts behavior when the header key uses uppercase "X-LCS-Merge-Server-Tools";
update the test (and the other affected tests around the same blocks) to also
call resolve_tool_choice with a lowercase header key "x-lcs-merge-server-tools"
and assert the same merged-tools behavior (same tools, tool_choice, etc.) so
header normalization won't cause regressions; locate the calls to
resolve_tool_choice and duplicate or parametrize them to include both header key
variants to validate lowercase handling.
- Around line 2626-2635: Add a complementary positive test that verifies
mcp_approval_request is classified server-side when the server label matches:
create a new test (e.g., test_mcp_approval_request_server_side) that mocks
configuration (mocker.patch("utils.responses.configuration")) with
mock_config.mcp_servers containing the target label (e.g., "client-mcp"), create
an item with item.type = "mcp_approval_request" and item.server_label =
"client-mcp", and assert is_server_deployed_output(item) is True; reference the
existing test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f5204ab3-7d8c-490d-8111-8a64f71f83a1
📒 Files selected for processing (6)
docs/goose-example-tool-merging.mdsrc/app/endpoints/responses.pysrc/utils/responses.pytests/integration/endpoints/test_query_integration.pytests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules from the Lightspeed Core Stack (e.g.,
from authentication import get_auth_dependency)
Files:
tests/integration/endpoints/test_query_integration.pytests/unit/app/endpoints/test_responses.pysrc/app/endpoints/responses.pysrc/utils/responses.pytests/unit/utils/test_responses.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests
Files:
tests/integration/endpoints/test_query_integration.pytests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use FastAPI imports from the fastapi module:from fastapi import APIRouter, HTTPException, Request, status, Depends
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack client imports
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntaxstr | intinstead ofUnion[str, int]for type annotations
UseOptional[Type]for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor functions performing I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with@abstractmethoddecorators for interface implementations
Use complete type annotations for all class attributes; avoid usingAnytype
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed
Files:
src/app/endpoints/responses.pysrc/utils/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
🧠 Learnings (2)
📚 Learning: 2026-02-25T07:46:39.608Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1211
File: src/models/responses.py:8-16
Timestamp: 2026-02-25T07:46:39.608Z
Learning: In the lightspeed-stack codebase, src/models/requests.py uses OpenAIResponseInputTool as Tool while src/models/responses.py uses OpenAIResponseTool as Tool. This type difference is intentional - input tools and output/response tools have different schemas in llama-stack-api.
Applied to files:
src/utils/responses.py
📚 Learning: 2026-02-23T14:56:59.186Z
Learnt from: asimurka
Repo: lightspeed-core/lightspeed-stack PR: 1198
File: src/utils/responses.py:184-192
Timestamp: 2026-02-23T14:56:59.186Z
Learning: In the lightspeed-stack codebase (lightspeed-core/lightspeed-stack), do not enforce de-duplication of duplicate client.models.list() calls in model selection flows (e.g., in src/utils/responses.py prepare_responses_params). These calls are considered relatively cheap and removing duplicates could add unnecessary complexity to the flow. Apply this guideline specifically to this file/context unless similar performance characteristics and design decisions are documented elsewhere.
Applied to files:
src/utils/responses.py
🪛 markdownlint-cli2 (0.22.0)
docs/goose-example-tool-merging.md
[warning] 14-14: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 95-95: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (13)
src/utils/responses.py (3)
1193-1195: LGTM! Server-deployed output filtering in turn summary.The filter correctly excludes client-provided tool outputs from the turn summary, ensuring only server-deployed tool calls and results are stored in metrics and conversation history.
1363-1411: LGTM! Tool merge logic with conflict detection.The implementation correctly:
- Detects MCP conflicts by comparing
server_labelvalues- Detects
file_searchconflicts when both client and server provide one- Returns merged tools with client tools first
- Uses appropriate HTTP 409 status code for conflicts
Minor observation: Line 1411 wraps inputs in
list()unnecessarily sinceclient_toolsandserver_toolsare already typed aslist[InputTool], but this has no functional impact.
1449-1476: LGTM! Header-driven merge behavior.The merge logic correctly:
- Checks for
X-LCS-Merge-Server-Tools: trueheader (case-insensitive)- Only merges when the header is explicitly set
- Preserves original behavior when header is absent
- Reuses the same
vector_store_idsresolved from client tools when preparing server toolstests/integration/endpoints/test_query_integration.py (2)
731-736: LGTM! Test expectations updated for server-deployed tool filtering.The assertions correctly reflect the new behavior where:
- Only server-deployed tool calls are included in
response.tool_callsfunction_callitems (client-provided) are excluded- The expected count is reduced from 2 to 1
The comment clearly explains the reasoning.
635-635: No issues found:server1is a valid MCP server name configured in test fixtures.Verification confirms the
server_labelchange to"server1"aligns with the MCP server configuration in the test fixtures (tests/integration/test_configuration.pylines 85-93 defineserver1as a valid MCP server). The test setup correctly pairs this label with themcp_list_toolsitem, and the assertions properly validate server-deployed tool classification.tests/unit/app/endpoints/test_responses.py (1)
1378-1427: LGTM! Comprehensive tests for_is_server_mcp_output_item.The test class provides good coverage including:
- MCP types with matching and non-matching server labels
- All three MCP event types (
mcp_call,mcp_list_tools,mcp_approval_request)- Missing
server_labelhandling- Non-MCP types returning
False- Empty configured labels edge case
src/app/endpoints/responses.py (5)
405-407: LGTM! Addedby_alias=Truefor SSE serialization.Using
by_alias=Trueensures field names are serialized according to their aliases (e.g.,created_atvscreatedAt), maintaining consistency with the OpenAI Responses specification for SSE events.Also applies to: 423-425, 435-435, 451-453
465-480: LGTM! Helper for filtering serialized MCP output items.The function correctly identifies server-deployed MCP items from the serialized
response.outputarray by checking:
- Item type is one of the MCP types
server_labelmatches a configured MCP serverThis complements
is_server_deployed_outputfromutils/responses.pywhich operates on typed objects rather than dicts.
483-527: LGTM! Streaming chunk filter with output index tracking.The implementation correctly:
- Tracks
output_indexvalues when server MCP items are added- Filters intermediate events (
response.mcp_call.*,response.mcp_list_tools.*) by tracked indices- Cleans up tracking when
output_item.doneis received usingdiscard()(safe for missing keys)- Mutates
server_mcp_output_indicesas a side effect, which is documented via the function's purpose
530-575: LGTM! Extracted finalization logic into helper.The
_finalize_responsehelper:
- Consolidates post-stream metadata extraction
- Uses
is_server_deployed_outputto filter tool summaries (consistent with non-streaming path)- Handles conversation persistence with
append_turn_items_to_conversationThis improves readability of
response_generatorby separating streaming from finalization concerns.
636-644: LGTM! Filtering server MCP items from response output.The inline list comprehension correctly removes server-deployed MCP items from the
outputarray in the serialized response, ensuring clients only receive item types they understand (message,function_call, etc.).docs/goose-example-tool-merging.md (1)
1-92: LGTM! Clear documentation for hybrid tool architecture.The documentation provides excellent coverage of:
- The separation of knowledge (server) vs action (client) tools
- Header-driven merge behavior with clear behavior table
- Conflict detection semantics
- SSE streaming filter mechanics with step-by-step explanation
- Turn summary behavior for server vs client outputs
The architecture and flow diagrams effectively illustrate the concepts.
tests/unit/utils/test_responses.py (1)
2576-2738: Good coverage on classification and conflict behavior.The new
TestIsServerDeployedOutputandTestMergeToolscases are well-targeted and align with the merge/conflict contract (including 409 behavior).
| ``` | ||
| OpenShift Cluster | ||
| ┌────────────────────────────────────────────────────────────────────────────┐ | ||
| │ │ | ||
| │ namespace: foo-staging │ | ||
| │ ┌─────────────────────────────────────┐ │ | ||
| │ │ Goose Agent (Pod) │ │ | ||
| │ │ │ │ | ||
| │ │ ServiceAccount: goose-readonly │ │ | ||
| │ │ (read-only, limited k8s/OCP access)│ │ | ||
| │ │ │ HTTPS POST /v1/responses │ | ||
| │ │ LLM reasoning loop │ │ | ||
| │ │ ├─ LCS /v1/responses ──────────────┼─────────────────────┐ │ | ||
| │ │ │ (knowledge queries) │ │ │ | ||
| │ │ │ │ ▼ │ | ||
| │ │ ├─ Local MCP tools │ namespace: lightspeed-stack │ | ||
| │ │ │ (cluster read-only operations) │ ┌─────────────────────────┐ │ | ||
| │ │ │ ├─ oc get pods │ │ Lightspeed Stack (LCS) │ │ | ||
| │ │ │ ├─ oc logs │ │ │ │ | ||
| │ │ │ └─ oc describe │ │ Llama Stack Engine │ │ | ||
| │ │ └──────────────────────────────────┘ │ ├─ file_search (RAG) │ │ | ||
| │ └──────────┬─────────────────────────── │ ├─ mcp: OKP/Solr │ │ | ||
| │ │ ServiceAccount token │ ├─ mcp: errata │ │ | ||
| │ │ (read-only RBAC) │ └─ mcp: bugzilla │ │ | ||
| │ │ └───────────┬─────────────┘ │ | ||
| │ │ ┌─────┴──────┐ │ | ||
| │ │ │FAISS / Solr│ │ | ||
| │ │ └────────────┘ │ | ||
| │ │ │ | ||
| │ │ namespace: foo-staging │ | ||
| │ │ ┌───────────────────────────┐ │ | ||
| │ └─▶│ pods, deployments, ... │ │ | ||
| │ │ (read-only RBAC enforced)│ │ | ||
| │ └───────────────────────────┘ │ | ||
| │ │ | ||
| └────────────────────────────────────────────────────────────────────────────┘ | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add language specifier to fenced code block.
Static analysis flagged this code block as missing a language specifier. For ASCII diagrams, use text or plaintext to satisfy linters and improve rendering hints.
-```
+```text
OpenShift Cluster🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 14-14: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/goose-example-tool-merging.md` around lines 14 - 50, The fenced ASCII
diagram block in docs/goose-example-tool-merging.md is missing a language
specifier which linters flag; update the opening fence for the diagram (the
triple-backtick that currently starts the block) to include a language token
such as text or plaintext (e.g., change ``` to ```text) so the ASCII diagram is
annotated for static analysis and rendering; ensure only the opening fence is
changed and the rest of the Goose Agent / OpenShift Cluster diagram content (the
ASCII art) remains untouched.
| ``` | ||
| Goose LCS /v1/responses Llama Stack MCP/RAG | ||
| │ │ │ │ | ||
| │ "pod api-7f8b9 is crash- │ │ │ | ||
| │ looping in foo-staging, │ │ │ | ||
| │ help me diagnose it" │ │ │ | ||
| │ │ │ │ | ||
| │ POST {input, tools, │ │ │ | ||
| │ X-LCS-Merge-Server-Tools} │ │ │ | ||
| │────────────────────────────▶│ │ │ | ||
| │ │ merge client + │ │ | ||
| │ │ server tools │ │ | ||
| │ │ │ │ | ||
| │ │ responses.create() │ │ | ||
| │ │──────────────────────▶│ │ | ||
| │ │ │ LLM: search RAG│ | ||
| │ │ │───────────────▶│ | ||
| │ │ │◀── RAG chunks: │ | ||
| │ │ │ known OOM fix │ | ||
| │ │ │ for api image │ | ||
| │ │ │ │ | ||
| │ │ │ LLM: query OKP │ | ||
| │ │ │───────────────▶│ | ||
| │ │ │◀── KB article: │ | ||
| │ │ │ memory limits │ | ||
| │ │ │ best practice │ | ||
| │ │ │ │ | ||
| │ │◀──────────────────────│ answer with │ | ||
| │ │ │ RAG + KB ctx │ | ||
| │ SSE: response.completed │ │ │ | ||
| │ (server MCP items filtered)│ │ │ | ||
| │◀────────────────────────────│ │ │ | ||
| │ │ │ │ | ||
| │ Goose combines LCS knowledge with live cluster data: │ | ||
| │ │ │ │ | ||
| │──▶ LOCAL MCP: oc get pods -n foo-staging │ | ||
| │◀── pod/api-7f8b9 CrashLoopBackOff │ | ||
| │──▶ LOCAL MCP: oc logs pod/api-7f8b9 │ | ||
| │◀── OOMKilled │ | ||
| │──▶ LOCAL MCP: oc describe pod/api-7f8b9 │ | ||
| │◀── events, resource limits │ | ||
| │ │ | ||
| │ Goose correlates live cluster state with RAG/OKP │ | ||
| │ knowledge to produce a diagnosis: │ | ||
| │ "Pod is OOMKilled. RAG docs confirm a known fix │ | ||
| │ for this image -- raise memory limit to 512Mi. │ | ||
| │ See KB article for memory tuning best practices." │ | ||
| ``` |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add language specifier to fenced code block.
Same as above, add text or plaintext language specifier for the conversation flow diagram.
-```
+```text
Goose LCS /v1/responses Llama Stack MCP/RAG🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 95-95: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/goose-example-tool-merging.md` around lines 95 - 142, The fenced code
block containing the conversation flow diagram (the block starting with "Goose
LCS /v1/responses Llama Stack MCP/RAG" and the opening triple
backticks) is missing a language specifier; update the opening fence from just
``` to ```text (or ```plaintext) so the diagram is treated as plain text in
rendering and syntax-highlighting. Locate the triple-backtick that begins the
diagram and add the specifier, leaving the rest of the block unchanged.
| def is_server_deployed_output(output_item: ResponseOutput) -> bool: | ||
| """Check if a response output item belongs to a tool deployed by LCS. | ||
|
|
||
| In the hybrid architecture clients may provide their own tools (function | ||
| tools or MCP servers running locally) alongside server-configured tools. | ||
| This function identifies items that belong to LCS-deployed tools so that | ||
| only those are included in server-side processing (turn summary, metrics, | ||
| storage). Client tool output items are still returned in the response | ||
| to the caller but are not processed internally. | ||
|
|
||
| Args: | ||
| output_item: A ResponseOutput item from the response. | ||
|
|
||
| Returns: | ||
| True if the item should be processed by LCS, False for client tools. | ||
| """ | ||
| item_type = getattr(output_item, "type", None) | ||
|
|
||
| # function_call items are always from client-provided tools; | ||
| # LCS only configures file_search and mcp tools. | ||
| if item_type == "function_call": | ||
| return False | ||
|
|
||
| # MCP items: check server_label against configured servers | ||
| if item_type in ("mcp_call", "mcp_list_tools", "mcp_approval_request"): | ||
| server_label = getattr(output_item, "server_label", None) | ||
| if server_label is not None: | ||
| configured_labels = {s.name for s in configuration.mcp_servers} | ||
| return server_label in configured_labels | ||
|
|
||
| # file_search_call, web_search_call, message, and unknown types | ||
| # are treated as server-side. | ||
| return True |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if mcp_approval_response has server_label attribute in the llama_stack_api types
rg -n "mcp_approval_response" --type=py -C5Repository: lightspeed-core/lightspeed-stack
Length of output: 7880
🏁 Script executed:
#!/bin/bash
# Find MCPApprovalResponse definition and where approval responses are created
rg -n "class MCPApprovalResponse|MCPApprovalResponse.*=" --type=py -A3 | head -50Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Search for where mcp_approval_response items originate/are constructed
rg -n "MCPApprovalResponse\(" --type=py -B2 -A2Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Check if MCPApprovalResponse has server_label attribute
rg -n "approval_request_id|server_label" src/utils/responses.py | grep -A2 -B2 "820\|821\|822"Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Find MCPApprovalResponse import and usage
rg -n "MCPApprovalResponse" --type=py | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 732
🏁 Script executed:
#!/bin/bash
# Look at the build_tool_call_summary function more closely to understand MCPApprovalResponse structure
sed -n '815,835p' src/utils/responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 774
🏁 Script executed:
#!/bin/bash
# Check all response output types handled in is_server_deployed_output context
rg -n "type.*=.*\"mcp" --type=py | head -30Repository: lightspeed-core/lightspeed-stack
Length of output: 2390
🏁 Script executed:
#!/bin/bash
# Check imports at top of responses.py to understand MCPApprovalResponse
head -50 src/utils/responses.py | grep -A5 -B5 "MCPApprovalResponse"Repository: lightspeed-core/lightspeed-stack
Length of output: 439
🏁 Script executed:
#!/bin/bash
# Search for any mentions of server_label in approval context
rg -n "server_label" src/utils/responses.py | grep -B3 -A3 "approval"Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Check test for is_server_deployed_output to see if mcp_approval_response is tested
rg -n "is_server_deployed_output" --type=py -B2 -A10Repository: lightspeed-core/lightspeed-stack
Length of output: 12041
🏁 Script executed:
#!/bin/bash
# Look for the full test class to see if there's a test for mcp_approval_response
sed -n '2576,2661p' tests/unit/utils/test_responses.pyRepository: lightspeed-core/lightspeed-stack
Length of output: 3657
🏁 Script executed:
#!/bin/bash
# Check the MCPApprovalResponse type to see what attributes it has
rg -n "class.*MCPApprovalResponse|@dataclass.*approval" --type=py -A15 | head -50Repository: lightspeed-core/lightspeed-stack
Length of output: 58
🏁 Script executed:
#!/bin/bash
# Check if approval_request_id can be used to track origin of approval responses
rg -n "approval_request_id" src/utils/responses.py -B3 -A3Repository: lightspeed-core/lightspeed-stack
Length of output: 410
🏁 Script executed:
#!/bin/bash
# Check if there's a way to track which approval_request originated from which server
rg -n "approval_request_id" src/utils/ -A2 -B2 | head -80Repository: lightspeed-core/lightspeed-stack
Length of output: 856
🏁 Script executed:
#!/bin/bash
# Look for where mcp_approval_response items are created or generated to see if there's origin tracking
rg -n "mcp_approval_response" src/ --type=py | grep -v test | grep -v ".pyc"Repository: lightspeed-core/lightspeed-stack
Length of output: 373
🏁 Script executed:
Aboriginal
# Check if there's any validation or origin tracking for approval requests/responses
rg -n "MCPApprovalRequest|MCPApprovalResponse" src/ --type=py -A5 | grep -E "server_label|origin|source" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 124
Add explicit handling for mcp_approval_response type or document design assumption.
The function checks mcp_approval_request but not mcp_approval_response. Since MCPApprovalResponse (imported from llama_stack_api.openai_responses) lacks a server_label attribute and only carries approval_request_id, this type falls through to the default return True, classifying all approval responses as server-deployed regardless of origin.
If client-provided MCP servers can emit approval responses, they would be incorrectly included in server-side processing (turn summary, storage) due to the missing origin check. Either:
- Add
server_labeltracking to approval responses so they can be classified by origin, or - Document that approval responses are always treated as server-side by design (with rationale), or
- Add test coverage for this scenario to
TestIsServerDeployedOutput
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/responses.py` around lines 1126 - 1158, The
is_server_deployed_output function currently treats an MCPApprovalResponse
(item_type "mcp_approval_response") as server-side because it lacks a
server_label; update is_server_deployed_output to explicitly handle
"mcp_approval_response" (e.g., return False to mark it as client-provided) or
implement lookup/propagation of server_label, and add a unit test in
TestIsServerDeployedOutput to cover the approval response case; reference
is_server_deployed_output and TestIsServerDeployedOutput when making the change.
| def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None: | ||
| """Test mcp_approval_request with unmatched label is client-side.""" | ||
| mock_config = mocker.Mock() | ||
| mock_config.mcp_servers = [] | ||
| mocker.patch("utils.responses.configuration", mock_config) | ||
|
|
||
| item = mocker.Mock() | ||
| item.type = "mcp_approval_request" | ||
| item.server_label = "client-mcp" | ||
| assert is_server_deployed_output(item) is False |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Add a positive mcp_approval_request server-side classification test.
Line 2626 only validates the unmatched-label/client-side path. Please add the matching-label path too, since this type is part of the server-event filtering contract.
Proposed test addition
class TestIsServerDeployedOutput:
@@
def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None:
"""Test mcp_approval_request with unmatched label is client-side."""
@@
assert is_server_deployed_output(item) is False
+
+ def test_mcp_approval_request_server_side_when_label_matches(
+ self, mocker: MockerFixture
+ ) -> None:
+ """Test mcp_approval_request with matching label is server-deployed."""
+ mock_config = mocker.Mock()
+ mock_server = mocker.Mock()
+ mock_server.name = "server-mcp"
+ mock_config.mcp_servers = [mock_server]
+ mocker.patch("utils.responses.configuration", mock_config)
+
+ item = mocker.Mock()
+ item.type = "mcp_approval_request"
+ item.server_label = "server-mcp"
+ assert is_server_deployed_output(item) is True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/utils/test_responses.py` around lines 2626 - 2635, Add a
complementary positive test that verifies mcp_approval_request is classified
server-side when the server label matches: create a new test (e.g.,
test_mcp_approval_request_server_side) that mocks configuration
(mocker.patch("utils.responses.configuration")) with mock_config.mcp_servers
containing the target label (e.g., "client-mcp"), create an item with item.type
= "mcp_approval_request" and item.server_label = "client-mcp", and assert
is_server_deployed_output(item) is True; reference the existing
test_mcp_approval_request_client_side, the configuration mock, and the
is_server_deployed_output function to locate where to add this test.
tests/unit/utils/test_responses.py
Outdated
| async def test_client_tools_with_merge_header( | ||
| self, mocker: MockerFixture | ||
| ) -> None: | ||
| """Test client tools merged with server tools when header is set.""" | ||
| mock_client = mocker.AsyncMock() | ||
| mock_holder = mocker.Mock() | ||
| mock_holder.get_client.return_value = mock_client | ||
| mocker.patch( | ||
| "utils.responses.AsyncLlamaStackClientHolder", | ||
| return_value=mock_holder, | ||
| ) | ||
| mock_config = mocker.Mock() | ||
| mock_config.configuration.byok_rag = [] | ||
| mocker.patch("utils.responses.configuration", mock_config) | ||
|
|
||
| server_mcp = InputToolMCP( | ||
| server_label="server-tool", server_url="http://server" | ||
| ) | ||
| mocker.patch( | ||
| "utils.responses.prepare_tools", | ||
| new=mocker.AsyncMock(return_value=[server_mcp]), | ||
| ) | ||
|
|
||
| client_tool = InputToolMCP( | ||
| server_label="client-tool", server_url="http://client" | ||
| ) | ||
| tools, tool_choice, _ = await resolve_tool_choice( | ||
| tools=[client_tool], | ||
| tool_choice=None, | ||
| token="tok", | ||
| request_headers={"X-LCS-Merge-Server-Tools": "true"}, | ||
| ) |
There was a problem hiding this comment.
Header-merge tests should also validate lowercase header keys.
Current assertions only use X-LCS-Merge-Server-Tools casing (Line 2803/2841/2899). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reaching resolve_tool_choice.
Proposed hardening for header-key coverage
class TestResolveToolChoice:
@@
- async def test_client_tools_with_merge_header(
+ `@pytest.mark.parametrize`(
+ "merge_header_key",
+ ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"],
+ )
+ async def test_client_tools_with_merge_header(
self, mocker: MockerFixture
+ , merge_header_key: str
) -> None:
@@
tools, tool_choice, _ = await resolve_tool_choice(
tools=[client_tool],
tool_choice=None,
token="tok",
- request_headers={"X-LCS-Merge-Server-Tools": "true"},
+ request_headers={merge_header_key: "true"},
)Also applies to: 2836-2842, 2895-2900
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/utils/test_responses.py` around lines 2773 - 2804, The test
test_client_tools_with_merge_header only asserts behavior when the header key
uses uppercase "X-LCS-Merge-Server-Tools"; update the test (and the other
affected tests around the same blocks) to also call resolve_tool_choice with a
lowercase header key "x-lcs-merge-server-tools" and assert the same merged-tools
behavior (same tools, tool_choice, etc.) so header normalization won't cause
regressions; locate the calls to resolve_tool_choice and duplicate or
parametrize them to include both header key variants to validate lowercase
handling.
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
Cover the new functions introduced in the server-tool merging feature: - is_server_deployed_output: classify output items as server vs client - _merge_tools: merge client and server tools with conflict detection - resolve_tool_choice: X-LCS-Merge-Server-Tools header integration - _is_server_mcp_output_item: filter serialized MCP items in streaming 29 new tests across utils/test_responses.py and endpoints/test_responses.py validating correct behavior for MCP label matching, 409 conflict rejection, file_search deduplication, and pass-through of non-MCP item types.
20809d0 to
f02648d
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/app/endpoints/test_responses.py`:
- Around line 1378-1427: Add a regression test ensuring approval-request
substream events are filtered from client streams: in TestIsServerMcpOutputItem
add a new test (e.g., test_mcp_approval_request_stream_filtering) that simulates
a streaming chunk key starting with "response.mcp_approval_request." and asserts
the streaming filter logic rejects/filters that chunk so approval-request events
cannot leak to clients; reference the mcp approval path by using the
"response.mcp_approval_request.*" key and the existing helper
_is_server_mcp_output_item (and whatever streaming chunk filter function is used
in the responses code) to validate the chunk is filtered when appropriate.
In `@tests/unit/utils/test_responses.py`:
- Around line 2840-2860: Add a new unit test that mirrors
test_no_tools_uses_prepare_tools but calls resolve_tool_choice with
tool_choice="none" (and tools=None) and asserts that the patched prepare_tools
(utils.responses.prepare_tools) is invoked with no_tools=True; locate the test
around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1d826784-c522-4d69-a050-222bc472dc2e
📒 Files selected for processing (2)
tests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: server mode / ci
- GitHub Check: E2E: library mode / ci
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules from the Lightspeed Core Stack (e.g.,
from authentication import get_auth_dependency)
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/endpoints/test_responses.pytests/unit/utils/test_responses.py
🔇 Additional comments (3)
tests/unit/app/endpoints/test_responses.py (1)
15-20: Import addition is clean and consistent.The new
_is_server_mcp_output_itemimport (Line 16) fits the existing absolute internal import pattern and keeps the test module scope focused.tests/unit/utils/test_responses.py (2)
2624-2633: Add a positivemcp_approval_requestserver-side classification test.Line 2626 only validates the unmatched-label/client-side path. A matching-label test is needed since
mcp_approval_requestis part of the server-event filtering contract (lines 2585-2596 test both paths formcp_call, lines 2611-2622 formcp_list_tools).Proposed test addition
class TestIsServerDeployedOutput: @@ def test_mcp_approval_request_client_side(self, mocker: MockerFixture) -> None: """Test mcp_approval_request with unmatched label is client-side.""" @@ assert is_server_deployed_output(item) is False + + def test_mcp_approval_request_server_side_when_label_matches( + self, mocker: MockerFixture + ) -> None: + """Test mcp_approval_request with matching label is server-deployed.""" + mock_config = mocker.Mock() + mock_server = mocker.Mock() + mock_server.name = "server-mcp" + mock_config.mcp_servers = [mock_server] + mocker.patch("utils.responses.configuration", mock_config) + + item = mocker.Mock() + item.type = "mcp_approval_request" + item.server_label = "server-mcp" + assert is_server_deployed_output(item) is True
2769-2802: Header-merge tests should validate lowercase header keys.Current assertions only use
X-LCS-Merge-Server-Toolscasing (lines 2797, 2835, 2889). Add a lowercase-key variant to prevent missing regressions when request headers are normalized before reachingresolve_tool_choice.Proposed hardening for header-key coverage
class TestResolveToolChoice: @@ - async def test_client_tools_with_merge_header( + `@pytest.mark.parametrize`( + "merge_header_key", + ["X-LCS-Merge-Server-Tools", "x-lcs-merge-server-tools"], + ) + async def test_client_tools_with_merge_header( self, mocker: MockerFixture + , merge_header_key: str ) -> None: @@ tools, tool_choice, _ = await resolve_tool_choice( tools=[client_tool], tool_choice=None, token="tok", - request_headers={"X-LCS-Merge-Server-Tools": "true"}, + request_headers={merge_header_key: "true"}, )Apply similar parametrization to
test_merge_header_conflict_raises_409andtest_merge_header_no_server_tools_returns_client_only.
| async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None: | ||
| """Test that no client tools falls through to prepare_tools.""" | ||
| mock_client = mocker.AsyncMock() | ||
| mock_holder = mocker.Mock() | ||
| mock_holder.get_client.return_value = mock_client | ||
| mocker.patch( | ||
| "utils.responses.AsyncLlamaStackClientHolder", | ||
| return_value=mock_holder, | ||
| ) | ||
| server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"]) | ||
| mock_prepare = mocker.AsyncMock(return_value=[server_tool]) | ||
| mocker.patch("utils.responses.prepare_tools", new=mock_prepare) | ||
|
|
||
| tools, _, _ = await resolve_tool_choice( | ||
| tools=None, | ||
| tool_choice=None, | ||
| token="tok", | ||
| ) | ||
| assert tools is not None | ||
| assert len(tools) == 1 | ||
| mock_prepare.assert_called_once() |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add test coverage for tool_choice="none" scenario.
The test only validates tool_choice=None (line 2855). Per the PR objectives, there's a known issue where tool_choice="none" may not correctly disable tools when tools=None. Add a test asserting that prepare_tools is called with no_tools=True when tool_choice="none".
Proposed test addition
`@pytest.mark.asyncio`
async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None:
"""Test that no client tools falls through to prepare_tools."""
@@
assert len(tools) == 1
mock_prepare.assert_called_once()
+
+ `@pytest.mark.asyncio`
+ async def test_tool_choice_none_passes_no_tools_flag(
+ self, mocker: MockerFixture
+ ) -> None:
+ """Test that tool_choice='none' sets no_tools=True in prepare_tools."""
+ mock_client = mocker.AsyncMock()
+ mock_holder = mocker.Mock()
+ mock_holder.get_client.return_value = mock_client
+ mocker.patch(
+ "utils.responses.AsyncLlamaStackClientHolder",
+ return_value=mock_holder,
+ )
+ mock_prepare = mocker.AsyncMock(return_value=None)
+ mocker.patch("utils.responses.prepare_tools", new=mock_prepare)
+
+ tools, _, _ = await resolve_tool_choice(
+ tools=None,
+ tool_choice="none",
+ token="tok",
+ )
+ assert tools is None
+ mock_prepare.assert_called_once()
+ # Verify no_tools=True was passed
+ call_kwargs = mock_prepare.call_args.kwargs
+ assert call_kwargs.get("no_tools") is True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None: | |
| """Test that no client tools falls through to prepare_tools.""" | |
| mock_client = mocker.AsyncMock() | |
| mock_holder = mocker.Mock() | |
| mock_holder.get_client.return_value = mock_client | |
| mocker.patch( | |
| "utils.responses.AsyncLlamaStackClientHolder", | |
| return_value=mock_holder, | |
| ) | |
| server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"]) | |
| mock_prepare = mocker.AsyncMock(return_value=[server_tool]) | |
| mocker.patch("utils.responses.prepare_tools", new=mock_prepare) | |
| tools, _, _ = await resolve_tool_choice( | |
| tools=None, | |
| tool_choice=None, | |
| token="tok", | |
| ) | |
| assert tools is not None | |
| assert len(tools) == 1 | |
| mock_prepare.assert_called_once() | |
| async def test_no_tools_uses_prepare_tools(self, mocker: MockerFixture) -> None: | |
| """Test that no client tools falls through to prepare_tools.""" | |
| mock_client = mocker.AsyncMock() | |
| mock_holder = mocker.Mock() | |
| mock_holder.get_client.return_value = mock_client | |
| mocker.patch( | |
| "utils.responses.AsyncLlamaStackClientHolder", | |
| return_value=mock_holder, | |
| ) | |
| server_tool = InputToolFileSearch(type="file_search", vector_store_ids=["vs1"]) | |
| mock_prepare = mocker.AsyncMock(return_value=[server_tool]) | |
| mocker.patch("utils.responses.prepare_tools", new=mock_prepare) | |
| tools, _, _ = await resolve_tool_choice( | |
| tools=None, | |
| tool_choice=None, | |
| token="tok", | |
| ) | |
| assert tools is not None | |
| assert len(tools) == 1 | |
| mock_prepare.assert_called_once() | |
| `@pytest.mark.asyncio` | |
| async def test_tool_choice_none_passes_no_tools_flag( | |
| self, mocker: MockerFixture | |
| ) -> None: | |
| """Test that tool_choice='none' sets no_tools=True in prepare_tools.""" | |
| mock_client = mocker.AsyncMock() | |
| mock_holder = mocker.Mock() | |
| mock_holder.get_client.return_value = mock_client | |
| mocker.patch( | |
| "utils.responses.AsyncLlamaStackClientHolder", | |
| return_value=mock_holder, | |
| ) | |
| mock_prepare = mocker.AsyncMock(return_value=None) | |
| mocker.patch("utils.responses.prepare_tools", new=mock_prepare) | |
| tools, _, _ = await resolve_tool_choice( | |
| tools=None, | |
| tool_choice="none", | |
| token="tok", | |
| ) | |
| assert tools is None | |
| mock_prepare.assert_called_once() | |
| # Verify no_tools=True was passed | |
| call_kwargs = mock_prepare.call_args.kwargs | |
| assert call_kwargs.get("no_tools") is True |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/utils/test_responses.py` around lines 2840 - 2860, Add a new unit
test that mirrors test_no_tools_uses_prepare_tools but calls resolve_tool_choice
with tool_choice="none" (and tools=None) and asserts that the patched
prepare_tools (utils.responses.prepare_tools) is invoked with no_tools=True;
locate the test around test_no_tools_uses_prepare_tools and use the same
AsyncLlamaStackClientHolder/resolve_tool_choice setup and mock_prepare
AsyncMock, then assert mock_prepare.assert_called_once_with(no_tools=True) (or
equivalent) to ensure prepare_tools receives the no_tools flag when
resolve_tool_choice(tool_choice="none") is used.
…lter_mcp_chunk The helper already handled mcp_approval_request items in the output_item.added and output_item.done branches, but the substream event prefix check only matched response.mcp_call.* and response.mcp_list_tools.*, allowing server-side approval request progress events to leak to clients that cannot handle them. Add the missing response.mcp_approval_request.* prefix check and add unit tests for _should_filter_mcp_chunk covering all three substream event types.
|
@tisnik thanks for the review. I added a commit to address the 'incomplete mcp filtering'. For the 'no_tools' comment I looked and think the no_tools variable is scoped to the entire function and is passed into both prepare_tools calls. let me know if I am missing something |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/app/endpoints/responses.py`:
- Around line 556-576: The persisted turn still contains client-side tool items
because append_turn_items_to_conversation is called with
latest_response_object.output; before calling append_turn_items_to_conversation
(after building turn_summary and rag_chunks) filter
latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7a365396-e85c-42bc-8ec0-e4067d6ace16
📒 Files selected for processing (2)
src/app/endpoints/responses.pytests/unit/app/endpoints/test_responses.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: E2E Tests for Lightspeed Evaluation job
- GitHub Check: E2E: library mode / ci
- GitHub Check: E2E: server mode / ci
🧰 Additional context used
📓 Path-based instructions (5)
**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use absolute imports for internal modules from the Lightspeed Core Stack (e.g.,
from authentication import get_auth_dependency)
Files:
src/app/endpoints/responses.pytests/unit/app/endpoints/test_responses.py
src/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/**/*.py: Use FastAPI imports from the fastapi module:from fastapi import APIRouter, HTTPException, Request, status, Depends
Usefrom llama_stack_client import AsyncLlamaStackClientfor Llama Stack client imports
Checkconstants.pyfor shared constants before defining new ones
All modules must start with descriptive docstrings explaining purpose
Uselogger = get_logger(__name__)fromlog.pyfor module logging
Define type aliases at module level for clarity in Lightspeed Core Stack
All functions require docstrings with brief descriptions following Google Python docstring conventions
Use complete type annotations for function parameters and return types
Use modern union type syntaxstr | intinstead ofUnion[str, int]for type annotations
UseOptional[Type]for nullable types in type annotations
Use snake_case with descriptive, action-oriented names for functions (e.g., get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead
Useasync deffor functions performing I/O operations and external API calls
HandleAPIConnectionErrorfrom Llama Stack in error handling logic
Use standard log levels appropriately: debug for diagnostic info, info for general execution, warning for unexpected events, error for serious problems
All classes require descriptive docstrings explaining their purpose
Use PascalCase for class names with descriptive names and standard suffixes (Configuration, Error/Exception, Resolver, Interface)
Use ABC (Abstract Base Classes) with@abstractmethoddecorators for interface implementations
Use complete type annotations for all class attributes; avoid usingAnytype
Follow Google Python docstring conventions with Parameters, Returns, Raises, and Attributes sections as needed
Files:
src/app/endpoints/responses.py
src/app/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/responses.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
tests/**/*.py: Use pytest for all unit and integration tests; do not use unittest
Maintain test coverage of at least 60% for unit tests and 10% for integration tests
Files:
tests/unit/app/endpoints/test_responses.py
tests/unit/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use
pytest-mockfor AsyncMock objects in unit tests
Files:
tests/unit/app/endpoints/test_responses.py
🔇 Additional comments (2)
tests/unit/app/endpoints/test_responses.py (1)
1466-1498: Good regression coverage for approval-request substreams.This closes the
response.mcp_approval_request.*hole that previously let unknown MCP events leak to OpenAI-compatible clients.src/app/endpoints/responses.py (1)
531-537: The review comment is incorrect. Both code paths inupdate_azure_token()properly update theAsyncLlamaStackClientHolder's internal client:
- Library client mode (
update_azure_tokenline 17): Callsreload_library_client(), which setsself._lsc = client(client.py:119)- Service client mode (
update_azure_tokenline 33): Callsupdate_provider_data(), which doesself._lsc = self._lsc.copy(set_default_headers=updated_headers)(client.py:54)Since
AsyncLlamaStackClientHolderis a Singleton, when_finalize_response()callsget_client()at line 572, it retrieves the same holder instance that was just updated byupdate_azure_token(). The client will have fresh credentials—there is no staleness issue.
| for item in latest_response_object.output: | ||
| if not is_server_deployed_output(item): | ||
| continue | ||
| tool_call, tool_result = build_tool_call_summary(item) | ||
| if tool_call: | ||
| turn_summary.tool_calls.append(tool_call) | ||
| if tool_result: | ||
| turn_summary.tool_results.append(tool_result) | ||
|
|
||
| tool_rag_chunks = parse_rag_chunks( | ||
| latest_response_object, | ||
| vector_store_ids, | ||
| configuration.rag_id_mapping, | ||
| ) | ||
| turn_summary.rag_chunks = inline_rag_context.rag_chunks + tool_rag_chunks | ||
|
|
||
| client = AsyncLlamaStackClientHolder().get_client() | ||
| if api_params.store and api_params.previous_response_id and latest_response_object: | ||
| await append_turn_items_to_conversation( | ||
| client, api_params.conversation, user_input, latest_response_object.output | ||
| ) |
There was a problem hiding this comment.
Filter client tool items before persisting the streamed turn.
Line 557 already gates internal processing with is_server_deployed_output(), but append_turn_items_to_conversation() on Lines 573-576 still receives the full latest_response_object.output. src/utils/conversations.py:454-497 stores every llm_output item it gets, so client-local MCP/function items are still persisted on the streaming path.
Proposed fix
client = AsyncLlamaStackClientHolder().get_client()
if api_params.store and api_params.previous_response_id and latest_response_object:
+ stored_output = [
+ item
+ for item in latest_response_object.output
+ if is_server_deployed_output(item)
+ ]
await append_turn_items_to_conversation(
- client, api_params.conversation, user_input, latest_response_object.output
+ client, api_params.conversation, user_input, stored_output
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/app/endpoints/responses.py` around lines 556 - 576, The persisted turn
still contains client-side tool items because append_turn_items_to_conversation
is called with latest_response_object.output; before calling
append_turn_items_to_conversation (after building turn_summary and rag_chunks)
filter latest_response_object.output to only include items that pass
is_server_deployed_output() (same predicate used earlier), and pass that
filtered list to append_turn_items_to_conversation so client-only MCP/function
outputs are not stored; reference latest_response_object.output,
is_server_deployed_output(), and append_turn_items_to_conversation() when making
the change.
Description
Add server-tool merging and filtering for Responses API
Introduce X-LCS-Merge-Server-Tools header that allows client-provided
tools to be merged with server-configured tools (RAG, MCP) instead of
replacing them. Tool conflicts (duplicate MCP server_label or duplicate
file_search) are rejected with a 409 error.
Add is_server_deployed_output() to distinguish LCS-deployed tools from
client tools so only server tools are included in turn summaries,
metrics, and storage. Client tool output items are still returned in
the response.
Filter server-deployed MCP streaming events (mcp_call, mcp_list_tools,
mcp_approval_request) from the SSE stream so OpenAI-compatible clients
like Goose don't fail on unknown item types. Strip these items from the
response.completed output array as well.
Includes doc/diagram of goose (local agent) use case we are prototyping.
Type of change
Tools used to create PR
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Documentation
Tests