Skip to content

feat(router): add LLM routing with cost optimization and pretrained configs#476

Open
bsbodden wants to merge 1 commit intomainfrom
llm-router
Open

feat(router): add LLM routing with cost optimization and pretrained configs#476
bsbodden wants to merge 1 commit intomainfrom
llm-router

Conversation

@bsbodden
Copy link
Collaborator

@bsbodden bsbodden commented Feb 16, 2026

Extends SemanticRouter with LLM model selection, cost-optimized routing, and pretrained configurations — routing queries to the right model using Redis vector search.

  • "hello, how are you?" → GPT-4.1 Nano ($0.10/M tokens)
  • "explain garbage collection" → Claude Sonnet 4.5 ($3/M tokens)
  • "architect a distributed system" → Claude Opus 4.5 ($5/M tokens)

Design

LLM routing is integrated directly into SemanticRouter. When a Route includes an optional model field, the router returns the LiteLLM-compatible model identifier alongside the match, with a confidence score derived from vector distance (1 - distance/2).

Schema extensions (all optional, no breaking changes):

  • Route gains model: Optional[str] and metadata: Dict
  • RouteMatch gains model, confidence, alternatives, and metadata fields
  • RoutingConfig gains cost_optimization, cost_weight, and default_route
  • Callable pattern: router(query) returns a RouteMatch
  • route_many() returns multiple ranked matches
  • Full async parity via AsyncSemanticRouter

Routes without a model field work exactly as before — existing SemanticRouter usage is unaffected.

Usage

Basic LLM routing:

from redisvl.extensions.router import SemanticRouter, Route

routes = [
    Route(name="simple", model="openai/gpt-4.1-nano",
          references=["hello", "hi"], distance_threshold=0.5),
    Route(name="expert", model="anthropic/claude-opus-4-5",
          references=["architect a system", "design an algorithm"],
          distance_threshold=0.7),
]
router = SemanticRouter(name="llm-router", routes=routes,
                        redis_url="redis://localhost:6379")

match = router("hello there")
print(match.model)       # openai/gpt-4.1-nano
print(match.confidence)  # 0.81

Pretrained config — ships with a 3-route config (simple/standard/expert) mapped to Bloom's Taxonomy levels, with pre-computed sentence-transformers/all-mpnet-base-v2 embeddings:

router = SemanticRouter.from_pretrained("default", redis_url="redis://localhost:6379")

Cost-optimized routing — when multiple routes match with similar distances, a cost penalty biases toward cheaper models:

from redisvl.extensions.router.schema import RoutingConfig

router = SemanticRouter(
    name="cost-router", routes=routes,
    routing_config=RoutingConfig(cost_optimization=True, cost_weight=0.3),
    redis_url="redis://localhost:6379",
)

Async:

router = await AsyncSemanticRouter.create(
    name="async-router", routes=routes, redis_url="redis://localhost:6379")
match = await router("hello")

Export/import with embeddings:

router.export_with_embeddings("my_router.json")
loaded = SemanticRouter.from_pretrained("my_router.json", redis_url="redis://localhost:6379")

Files changed

Area Files
Core router redisvl/extensions/router/semantic.py, schema.py, __init__.py
Pretrained configs redisvl/extensions/router/pretrained/__init__.py, default.json
Query support redisvl/query/query.py, redisvl/utils/full_text_query_helper.py
Tests tests/unit/test_llm_router_schema.py, tests/integration/conftest.py, tests/unit/conftest.py
Docs docs/user_guide/13_llm_router.ipynb
Tooling scripts/generate_pretrained_config.py

Copilot AI review requested due to automatic review settings February 16, 2026 22:27

This comment was marked as outdated.

This comment was marked as outdated.

vishal-bala

This comment was marked as outdated.

@bsbodden bsbodden requested review from rbs333 and removed request for abrookins and tylerhutcherson February 25, 2026 20:21
rbs333

This comment was marked as resolved.

@bsbodden

This comment was marked as outdated.

@bsbodden

This comment was marked as resolved.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 10 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copilot AI review requested due to automatic review settings March 2, 2026 21:19

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copy link
Collaborator

@rbs333 rbs333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still have some disconnects on the design of this feature. Let's maybe set up some time to talk through it.

@redis redis deleted a comment from jit-ci bot Mar 15, 2026
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@bsbodden bsbodden changed the title feat: LLM Router extension for cost-optimized model selection feat(router): add LLM routing with cost optimization and pretrained configs Mar 15, 2026
Copilot AI review requested due to automatic review settings March 15, 2026 17:55
@jit-ci
Copy link

jit-ci bot commented Mar 15, 2026

🛡️ Jit Security Scan Results

CRITICAL HIGH MEDIUM

✅ No security findings were detected in this PR


Security scan by Jit

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copilot AI review requested due to automatic review settings March 15, 2026 19:25
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 18 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@bsbodden
Copy link
Collaborator Author

bsbodden commented Mar 15, 2026

@rbs333 Reworked the PR based on your feedback. Here's where things stand:

1. No separate class — SemanticRouter is the API

All LLM routing logic (cost optimization, confidence scoring, from_pretrained(), export_with_embeddings()) lives directly in SemanticRouter. No separate LLMRouter class.

2. "Tier" → "Route"

Route with an optional model field is the only concept. All code, tests, and the notebook use Route/route consistently.

3. Callable pattern

router(query) throughout — no router.route().

4. from_pretrained() on SemanticRouter

from redisvl.extensions.router import SemanticRouter

router = SemanticRouter.from_pretrained("default", redis_url="redis://localhost:6379")
match = router("hello")  # -> RouteMatch(name="simple", model="openai/gpt-4.1-nano", ...)

5. Dead code removed

  • llm_router/router.py (1,528 lines) — deleted
  • llm_router/schema.py (206 lines) — deleted
  • llm_router/DESIGN.md — deleted

6. Notebook fully rewritten

Every cell uses SemanticRouter, Route, router(), match.name.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copilot AI review requested due to automatic review settings March 15, 2026 19:51
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 14 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…onfigs

Extend SemanticRouter to support LLM model selection by adding optional
model, confidence, cost optimization, and multi-match capabilities to
the existing routing infrastructure.

When a Route includes a `model` field, the router returns the LiteLLM-
compatible model identifier alongside the match, with a confidence score
derived from vector distance. Cost-optimized routing biases toward
cheaper models when semantic distances are close, using a configurable
cost_weight penalty.

Key additions to SemanticRouter:
- Route.model (optional) for LiteLLM model identifiers
- RouteMatch.confidence, .alternatives, .metadata fields
- RoutingConfig.cost_optimization and .cost_weight settings
- RoutingConfig.default_route for fallback when no match found
- from_pretrained() to load routers with pre-computed embeddings
- export_with_embeddings() to serialize routers with vectors
- AsyncSemanticRouter with full async parity

A built-in "default" pretrained config ships with 3 tiers (simple,
standard, expert) mapped to GPT-4.1 Nano, Claude Sonnet 4.5, and
Claude Opus 4.5, using pre-computed sentence-transformers embeddings.

Backward compatibility:
- LLMRouter/AsyncLLMRouter provided as deprecated wrappers
- ModelTier subclass enforces required model field
- Legacy field names (tiers/default_tier) mapped bidirectionally
- Existing SemanticRouter usage is fully unaffected

Includes integration tests, unit tests for schema validation,
a user guide notebook, and a pretrained config generation script.
Copy link
Collaborator

@rbs333 rbs333 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I think now it just needs an update on the docs and then a quick latency check to make sure we're not adding overhead vs previous router lookup speed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

last_error = e
if attempt < 2: # Don't download on last attempt
try:
nltk.download("stopwords", quiet=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes for the router?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants