Skip to content

LCORE-1270: E2e tests responses tools#1414

Draft
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:e2e_tests_responses_tools
Draft

LCORE-1270: E2e tests responses tools#1414
asimurka wants to merge 2 commits intolightspeed-core:mainfrom
asimurka:e2e_tests_responses_tools

Conversation

@asimurka
Copy link
Copy Markdown
Contributor

@asimurka asimurka commented Mar 27, 2026

Description

This PR adds E2E tests for tool choices in responses endpoint

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Enhanced tool_choice with allowed_tools option to filter available tools based on type and metadata.
  • Documentation

    • Clarified allowed_tools configuration semantics and filtering behavior.
    • Updated documentation with MCP tool filtering examples using server_label and name.

@asimurka asimurka marked this pull request as draft March 27, 2026 15:53
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 27, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6cb86b99-163c-413e-81db-f9c0bcccd046

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Introduces an "allowed_tools" mode for tool_choice that enables filtering tools via allowlist entries matching tool attributes. Implementation includes tool matching helpers for attributes (file_search, mcp servers, function names, web_search) and MCP-specific narrowing logic. Documentation clarified, with comprehensive unit and end-to-end test coverage added.

Changes

Cohort / File(s) Summary
Documentation
docs/responses.md
Updated tool_choice documentation for the new allowed_tools option: clarified that tools array contains filter objects (not full tool definitions), refined semantics for file_search and web_search, updated examples to use MCP filters by server_label + name, and added note about omitting name in MCP filters.
Core Implementation
src/utils/responses.py
Added AllowedTools import and support for "allowed-tools" mode in resolve_tool_choice. Introduced filtering helpers: tool_matches_allowed_entry, group_mcp_tools_by_server, mcp_strip_name_from_allowlist_entries, mcp_project_allowed_tools_to_names, and filter_tools_by_allowed_entries. Refactored control flow to handle ToolChoiceMode.none early return, apply allowlist filtering to both explicit and implicitly prepared tools, and clear tool choice when no tools remain after filtering.
E2E Test Scenarios
tests/e2e/features/responses.feature
Added six new Gherkin scenarios validating tool_choice behaviors: none (no tool invocation), auto (file_search expected), required, {type: "file_search"}, and {type: "allowed_tools"} with various allowlist configurations. All scenarios include token metric assertions.
E2E Test Helpers
tests/e2e/features/steps/llm_query_response.py
Added JSON response validation layer with _collect_output_item_types() helper. Implemented five new Behave @then steps for validating tool output item types (assertions for presence/absence of specific types) and a new fragment assertion step for /v1/responses output_text field.
Unit Tests
tests/unit/utils/test_responses.py
Added comprehensive test coverage for resolve_tool_choice across ToolChoiceMode and AllowedTools flows, explicit vs implicit tool preparation, and ToolChoiceMode.none behavior. Added tests for filter_tools_by_allowed_entries covering per-type matching, MCP grouping by server, and name projection semantics. Updated imports to include new tool/choice types.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 97.87% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title mentions e2e tests for responses tools, which aligns with the test additions in responses.feature and llm_query_response.py, but obscures the significant documentation and utility changes in responses.md and responses.py.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@asimurka asimurka changed the title E2e tests responses tools LCORE-1270: E2e tests responses tools Mar 27, 2026
@asimurka
Copy link
Copy Markdown
Contributor Author

Do not close, needs rebase at the top of #1412

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/responses.md`:
- Around line 283-284: Update the docs to state that the `allowed_tools` filter
is applied to both request-supplied `tools` and the implicit tool set produced
by `prepare_tools()` (including the LCORE tool set), i.e., `allowed_tools`
filters the prepared/implicit tools when the request omits `tools` as well as
the explicit `tools` array; clarify that `mode` remains `"auto"` or `"required"`
and `tools` is a list of key-valued filters that apply to all candidate tools,
including LCORE.

In `@src/utils/responses.py`:
- Around line 493-505: The helper mcp_strip_name_from_allowlist_entries
currently removes "name" for any entry with type "mcp", which can turn
{"type":"mcp","name":"foo"} without a server_label into a permissive
{"type":"mcp"}; change the logic so you only remove the "name" when the entry is
an MCP and also contains a "server_label" (i.e., check new_entry.get("type") ==
"mcp" and new_entry.get("server_label") is not None) so malformed/narrow MCP
filters without server_label are left intact (and thus handled/ignored by
group_mcp_tools_by_server/filter_tools_by_allowed_entries) instead of widening
the allowlist.
- Around line 1515-1517: The early return when tool_choice ==
ToolChoiceMode.none is currently returning None for all three outputs and clears
explicit vector_store_ids; change the behavior so disabling tools only disables
tool-related outputs (e.g., tool config and tool inputs) but preserves and
returns the original vector_store_ids so build_rag_context still receives
file_search.vector_store_ids; locate the check against ToolChoiceMode.none
(variable tool_choice, enum ToolChoiceMode) and adjust the returned tuple to
keep the third value (vector_store_ids) instead of None.

In `@tests/e2e/features/responses.feature`:
- Around line 555-575: Update the scenario's request body so the model is forced
to attempt a file search: inside the JSON passed to the step "When I use
\"responses\" to ask question with authorization header" (the request that
contains "tool_choice" and "instructions"), change the "instructions" value to
explicitly require the file_search tool (e.g., include a sentence like "You MUST
use the file_search tool to answer this question.") so the negative check on
"responses output should not include an item with type \"file_search_call\""
actually verifies that the allowlist filtering (the "tool_choice" -> "tools":
[{"type":"mcp"}]) prevented the model from using file_search.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: f40a15ef-64e0-4d60-8dbd-d81eebc4eb50

📥 Commits

Reviewing files that changed from the base of the PR and between e184541 and c7b90c3.

📒 Files selected for processing (5)
  • docs/responses.md
  • src/utils/responses.py
  • tests/e2e/features/responses.feature
  • tests/e2e/features/steps/llm_query_response.py
  • tests/unit/utils/test_responses.py

@asimurka asimurka force-pushed the e2e_tests_responses_tools branch from c7b90c3 to 0b1f3ba Compare March 27, 2026 16:38
@tisnik
Copy link
Copy Markdown
Contributor

tisnik commented Mar 28, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

1406 - Partially compliant

Compliant requirements:

  • Add end-to-end tests for tool configuration and restrictions in the responses endpoint
  • Cover tool_choice modes none, auto, required
  • Assert tool invocation items presence/absence
  • Verify output_text fragments for file_search scenarios
  • Capture token metrics in tests

Non-compliant requirements:

  • End-to-end tests for allowed_tools filtering of MCP tools by specific name are missing

Requires further human verification:

(none)

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Missing Test

There is no end-to-end scenario verifying that allowed_tools with specific MCP tool name values restricts MCP invocations to those names. Without it, the MCP name-filtering behavior added in code isn’t validated in e2e tests.

Scenario: Check if responses endpoint with allowed tools in automatic mode answers knowledge question using file search
  Given The system is in default state
    And I set the Authorization header to Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva
    And I capture the current token metrics
  When I use "responses" to ask question with authorization header
  """
  {
    "input": "What is the title of the article from Paul?",
    "model": "{PROVIDER}/{MODEL}",
    "stream": false,
    "instructions": "You are an assistant. You MUST use the file_search tool to answer. Answer in lowercase.",
    "tool_choice": {
      "type": "allowed_tools",
      "mode": "auto",
      "tools": [{"type": "file_search"}]
    }
  }
  """
  Then The status code of the response is 200
    And The responses output should include an item with type "file_search_call"
    And The responses output_text should contain following fragments
      | Fragments in LLM response |
      | great work                |
    And The token metrics should have increased

Scenario: Check if responses endpoint with allowed tools in required mode invokes file search for a basic question
  Given The system is in default state
    And I set the Authorization header to Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6Ikpva
    And I capture the current token metrics
  When I use "responses" to ask question with authorization header
  """
  {
    "input": "Hello world!",
    "model": "{PROVIDER}/{MODEL}",
    "stream": false,
    "tool_choice": {
      "type": "allowed_tools",
      "mode": "required",
      "tools": [{"type": "file_search"}]
    }
  }
  """
  Then The status code of the response is 200
    And The responses output should include an item with type "file_search_call"
    And The token metrics should have increased

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants