LCORE-1270: E2e tests responses tools by asimurka · Pull Request #1414 · lightspeed-core/lightspeed-stack

asimurka · 2026-03-27T15:52:58Z

Description

This PR adds E2E tests for tool choices in responses endpoint

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Enhanced tool_choice with allowed_tools option to filter available tools based on type and metadata.
Documentation
- Clarified allowed_tools configuration semantics and filtering behavior.
- Updated documentation with MCP tool filtering examples using server_label and name.

coderabbitai · 2026-03-27T15:53:15Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6cb86b99-163c-413e-81db-f9c0bcccd046

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Introduces an "allowed_tools" mode for tool_choice that enables filtering tools via allowlist entries matching tool attributes. Implementation includes tool matching helpers for attributes (file_search, mcp servers, function names, web_search) and MCP-specific narrowing logic. Documentation clarified, with comprehensive unit and end-to-end test coverage added.

Changes

Cohort / File(s)	Summary
Documentation `docs/responses.md`	Updated `tool_choice` documentation for the new `allowed_tools` option: clarified that `tools` array contains filter objects (not full tool definitions), refined semantics for `file_search` and `web_search`, updated examples to use MCP filters by `server_label` + `name`, and added note about omitting `name` in MCP filters.
Core Implementation `src/utils/responses.py`	Added `AllowedTools` import and support for "allowed-tools" mode in `resolve_tool_choice`. Introduced filtering helpers: `tool_matches_allowed_entry`, `group_mcp_tools_by_server`, `mcp_strip_name_from_allowlist_entries`, `mcp_project_allowed_tools_to_names`, and `filter_tools_by_allowed_entries`. Refactored control flow to handle `ToolChoiceMode.none` early return, apply allowlist filtering to both explicit and implicitly prepared tools, and clear tool choice when no tools remain after filtering.
E2E Test Scenarios `tests/e2e/features/responses.feature`	Added six new Gherkin scenarios validating `tool_choice` behaviors: `none` (no tool invocation), `auto` (file_search expected), `required`, `{type: "file_search"}`, and `{type: "allowed_tools"}` with various allowlist configurations. All scenarios include token metric assertions.
E2E Test Helpers `tests/e2e/features/steps/llm_query_response.py`	Added JSON response validation layer with `_collect_output_item_types()` helper. Implemented five new Behave `@then` steps for validating tool output item types (assertions for presence/absence of specific types) and a new fragment assertion step for `/v1/responses` `output_text` field.
Unit Tests `tests/unit/utils/test_responses.py`	Added comprehensive test coverage for `resolve_tool_choice` across `ToolChoiceMode` and `AllowedTools` flows, explicit vs implicit tool preparation, and `ToolChoiceMode.none` behavior. Added tests for `filter_tools_by_allowed_entries` covering per-type matching, MCP grouping by server, and name projection semantics. Updated imports to include new tool/choice types.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 97.87% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title mentions e2e tests for responses tools, which aligns with the test additions in responses.feature and llm_query_response.py, but obscures the significant documentation and utility changes in responses.md and responses.py.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

asimurka · 2026-03-27T15:58:19Z

Do not close, needs rebase at the top of #1412

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/responses.md`:
- Around line 283-284: Update the docs to state that the `allowed_tools` filter
is applied to both request-supplied `tools` and the implicit tool set produced
by `prepare_tools()` (including the LCORE tool set), i.e., `allowed_tools`
filters the prepared/implicit tools when the request omits `tools` as well as
the explicit `tools` array; clarify that `mode` remains `"auto"` or `"required"`
and `tools` is a list of key-valued filters that apply to all candidate tools,
including LCORE.

In `@src/utils/responses.py`:
- Around line 493-505: The helper mcp_strip_name_from_allowlist_entries
currently removes "name" for any entry with type "mcp", which can turn
{"type":"mcp","name":"foo"} without a server_label into a permissive
{"type":"mcp"}; change the logic so you only remove the "name" when the entry is
an MCP and also contains a "server_label" (i.e., check new_entry.get("type") ==
"mcp" and new_entry.get("server_label") is not None) so malformed/narrow MCP
filters without server_label are left intact (and thus handled/ignored by
group_mcp_tools_by_server/filter_tools_by_allowed_entries) instead of widening
the allowlist.
- Around line 1515-1517: The early return when tool_choice ==
ToolChoiceMode.none is currently returning None for all three outputs and clears
explicit vector_store_ids; change the behavior so disabling tools only disables
tool-related outputs (e.g., tool config and tool inputs) but preserves and
returns the original vector_store_ids so build_rag_context still receives
file_search.vector_store_ids; locate the check against ToolChoiceMode.none
(variable tool_choice, enum ToolChoiceMode) and adjust the returned tuple to
keep the third value (vector_store_ids) instead of None.

In `@tests/e2e/features/responses.feature`:
- Around line 555-575: Update the scenario's request body so the model is forced
to attempt a file search: inside the JSON passed to the step "When I use
\"responses\" to ask question with authorization header" (the request that
contains "tool_choice" and "instructions"), change the "instructions" value to
explicitly require the file_search tool (e.g., include a sentence like "You MUST
use the file_search tool to answer this question.") so the negative check on
"responses output should not include an item with type \"file_search_call\""
actually verifies that the allowlist filtering (the "tool_choice" -> "tools":
[{"type":"mcp"}]) prevented the model from using file_search.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f40a15ef-64e0-4d60-8dbd-d81eebc4eb50

📥 Commits

Reviewing files that changed from the base of the PR and between e184541 and c7b90c3.

📒 Files selected for processing (5)

docs/responses.md
src/utils/responses.py
tests/e2e/features/responses.feature
tests/e2e/features/steps/llm_query_response.py
tests/unit/utils/test_responses.py

docs/responses.md

src/utils/responses.py

tests/e2e/features/responses.feature

tisnik · 2026-03-28T22:11:34Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶 1406 - Partially compliant Compliant requirements: Add end-to-end tests for tool configuration and restrictions in the responses endpoint Cover `tool_choice` modes none, auto, required Assert tool invocation items presence/absence Verify `output_text` fragments for file_search scenarios Capture token metrics in tests Non-compliant requirements: End-to-end tests for `allowed_tools` filtering of MCP tools by specific `name` are missing Requires further human verification: (none)
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Missing Test There is no end-to-end scenario verifying that `allowed_tools` with specific MCP tool `name` values restricts MCP invocations to those names. Without it, the MCP name-filtering behavior added in code isn’t validated in e2e tests. Scenario: Check if responses endpoint with allowed tools in automatic mode answers knowledge question using file search Given The system is in default state And I set the Authorization header to Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva And I capture the current token metrics When I use "responses" to ask question with authorization header """ { "input": "What is the title of the article from Paul?", "model": "{PROVIDER}/{MODEL}", "stream": false, "instructions": "You are an assistant. You MUST use the file_search tool to answer. Answer in lowercase.", "tool_choice": { "type": "allowed_tools", "mode": "auto", "tools": [{"type": "file_search"}] } } """ Then The status code of the response is 200 And The responses output should include an item with type "file_search_call" And The responses output_text should contain following fragments \| Fragments in LLM response \| \| great work \| And The token metrics should have increased Scenario: Check if responses endpoint with allowed tools in required mode invokes file search for a basic question Given The system is in default state And I set the Authorization header to Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva And I capture the current token metrics When I use "responses" to ask question with authorization header """ { "input": "Hello world!", "model": "{PROVIDER}/{MODEL}", "stream": false, "tool_choice": { "type": "allowed_tools", "mode": "required", "tools": [{"type": "file_search"}] } } """ Then The status code of the response is 200 And The responses output should include an item with type "file_search_call" And The token metrics should have increased

Added e2e tests for tool choices in responses endpoint

f265811

asimurka marked this pull request as draft March 27, 2026 15:53

asimurka changed the title ~~E2e tests responses tools~~ LCORE-1270: E2e tests responses tools Mar 27, 2026

coderabbitai bot reviewed Mar 27, 2026

View reviewed changes

docs/responses.md Show resolved Hide resolved

src/utils/responses.py Show resolved Hide resolved

src/utils/responses.py Show resolved Hide resolved

tests/e2e/features/responses.feature Show resolved Hide resolved

Adjust tools resolution

0b1f3ba

asimurka force-pushed the e2e_tests_responses_tools branch from c7b90c3 to 0b1f3ba Compare March 27, 2026 16:38

tisnik added the Review effort 4/5 label Mar 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LCORE-1270: E2e tests responses tools#1414

LCORE-1270: E2e tests responses tools#1414
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:e2e_tests_responses_tools

asimurka commented Mar 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 27, 2026 •

edited

Loading

Review skipped

Uh oh!

asimurka commented Mar 27, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asimurka commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Uh oh!

asimurka commented Mar 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tisnik commented Mar 28, 2026

PR Reviewer Guide 🔍

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asimurka commented Mar 27, 2026 •

edited

Loading

coderabbitai bot commented Mar 27, 2026 •

edited

Loading