feat: add vLLM (OpenAI-compatible) provider for local model deployment by Sevenal · Pull Request #28 · paoloanzn/free-code

Sevenal · 2026-04-10T08:41:15Z

Add support for local models via vLLM or any OpenAI-compatible API. Users can now point Claude Code at a self-hosted vLLM server instead of the Anthropic API, with full support for streaming responses and tool_use (function calling).

Changes:

Add vllm-fetch-adapter.ts: fetch interceptor that translates between Anthropic Messages API and OpenAI Chat Completions API format
Add 'vllm' to APIProvider type and detect via CLAUDE_CODE_USE_VLLM
Add vLLM provider branch in client.ts (after Codex provider)
Add getVLLMApiKey() and isVLLMSubscriber() auth utilities
Register vLLM env vars in managedEnvConstants.ts

Environment variables:
CLAUDE_CODE_USE_VLLM=1 Enable vLLM provider
VLLM_API_KEY API key (or OPENAI_API_KEY)
VLLM_BASE_URL vLLM server URL (default http://localhost:8000)
ANTHROPIC_MODEL Model name passed to vLLM directly

Summary by CodeRabbit

New Features
- Added support for vLLM (OpenAI-compatible) API provider, enabling configuration and use of vLLM-compatible models.
- Implemented new authentication and provider detection utilities for vLLM integration.
- Added environment variable support for vLLM configuration and API key management.

Add support for local models via vLLM or any OpenAI-compatible API. Users can now point Claude Code at a self-hosted vLLM server instead of the Anthropic API, with full support for streaming responses and tool_use (function calling). Changes: - Add vllm-fetch-adapter.ts: fetch interceptor that translates between Anthropic Messages API and OpenAI Chat Completions API format - Add 'vllm' to APIProvider type and detect via CLAUDE_CODE_USE_VLLM - Add vLLM provider branch in client.ts (after Codex provider) - Add getVLLMApiKey() and isVLLMSubscriber() auth utilities - Register vLLM env vars in managedEnvConstants.ts Environment variables: CLAUDE_CODE_USE_VLLM=1 Enable vLLM provider VLLM_API_KEY API key (or OPENAI_API_KEY) VLLM_BASE_URL vLLM server URL (default http://localhost:8000) ANTHROPIC_MODEL Model name passed to vLLM directly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-04-10T08:41:28Z

📝 Walkthrough

Walkthrough

This change adds vLLM (OpenAI-compatible) as a new API provider option. It extends the provider detection system, introduces authentication helpers, registers environment variables for provider routing, and implements a comprehensive fetch adapter that translates Anthropic SDK request/response formats to vLLM/OpenAI formats, including request transformation, streaming response parsing, and error handling.

Changes

Cohort / File(s)	Summary
Provider Detection & Type System `src/utils/model/providers.ts`	Extended `APIProvider` union type to include `'vllm'`; updated `getAPIProvider()` to check `CLAUDE_CODE_USE_VLLM` env var and return `'vllm'` when enabled.
Authentication & Environment Variables `src/utils/auth.ts`, `src/utils/managedEnvConstants.ts`	Added `getVLLMApiKey()` and `isVLLMSubscriber()` helpers; registered `CLAUDE_CODE_USE_VLLM`, `VLLM_BASE_URL`, and `VLLM_API_KEY` in provider-managed and safe env var lists.
API Client Integration `src/services/api/client.ts`	Added early-return branch in `getAnthropicClient()` to initialize Anthropic client with vLLM when `CLAUDE_CODE_USE_VLLM` is enabled, using fetched adapter and env var configuration.
vLLM Fetch Adapter `src/services/api/vllm-fetch-adapter.ts`	New 702-line adapter implementing request/response translation: converts Anthropic `/v1/messages` calls to vLLM `/v1/chat/completions`; transforms tool schemas and message payloads; parses streaming responses and emits Anthropic SSE events; handles errors and non-OK HTTP responses.

Sequence Diagram

sequenceDiagram
    participant Client as Client Application
    participant SDK as Anthropic SDK
    participant Adapter as vLLM Fetch Adapter
    participant vLLM as vLLM Server

    Client->>SDK: Call Anthropic API with tools
    SDK->>Adapter: POST /v1/messages (Anthropic format)
    Adapter->>Adapter: Transform tools (Anthropic → OpenAI)
    Adapter->>Adapter: Transform messages & system prompt
    Adapter->>vLLM: POST /v1/chat/completions (OpenAI format)
    vLLM-->>Adapter: Stream SSE data
    Adapter->>Adapter: Parse OpenAI streaming response
    Adapter->>Adapter: Transform to Anthropic SSE events
    Adapter-->>SDK: Stream Anthropic SSE format
    SDK-->>Client: Return parsed response with tool calls

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 A vLLM tunnel, oh what a sight!
With adapters that dance through the digital night,
Anthropic conversations to OpenAI format we bend,
Request and response, transform end to end,
New providers converge in this code-crafted blend! 🚀

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding vLLM support as an OpenAI-compatible provider for local model deployment, which matches the primary objective of the PR.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e8b287e134

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-10T08:46:57Z

src/utils/model/providers.ts

-          ? 'openai'
-          : 'firstParty'
+        : isEnvTruthy(process.env.CLAUDE_CODE_USE_VLLM)
+          ? 'vllm'


Add vLLM model mappings before returning provider

Returning 'vllm' here enables a provider value that the model-string/config pipeline does not define (ALL_MODEL_CONFIGS entries are keyed through existing providers only). getBuiltinModelStrings(getAPIProvider()) will therefore produce undefined model IDs for vLLM, and default model resolution can flow into parseUserSpecifiedModel with an undefined value, crashing on .trim() when CLAUDE_CODE_USE_VLLM=1 is set without an explicit model override.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T08:46:57Z

src/services/api/vllm-fetch-adapter.ts

+    const vllmResponse = await globalThis.fetch(chatCompletionsUrl, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        Accept: 'text/event-stream',
+        Authorization: `Bearer ${apiKey || 'sk-placeholder'}`,
+      },
+      body: JSON.stringify(vllmBody),
+    })


Forward abort signal when proxying to vLLM

The adapter creates a new fetch request but drops init?.signal (and related request options) from the original Anthropic SDK call. In this codebase, streaming requests are issued with abort signals for user cancel/watchdog timeouts, so dropping the signal means aborted CLI requests can continue running against vLLM until completion, causing hung cancellation behavior and unnecessary backend load.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-10T08:46:57Z

src/services/api/vllm-fetch-adapter.ts

+    if (!url.includes('/v1/messages')) {
+      return globalThis.fetch(input, init)


Limit interception to messages create endpoint

Using url.includes('/v1/messages') also catches /v1/messages/count_tokens, so token-count requests are incorrectly rerouted to chat completions and translated as SSE. The SDK expects a JSON token-count response for countTokens, so this breaks API-based counting under vLLM and forces degraded fallback estimation paths.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/services/api/client.ts (1)
139-145: ⚠️ Potential issue | 🟠 Major

Short-circuit Anthropic auth before the vLLM branch.

Every vLLM client creation still runs checkAndRefreshOAuthTokenIfNeeded() and may execute apiKeyHelper first, even though this path never talks to Anthropic. That adds unrelated login latency/failures to local-model traffic and widens the command-execution surface for the new provider.

Also applies to: 323-337
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/api/client.ts` around lines 139 - 145, Only run
Anthropic-specific auth/token work for Anthropic consumers: move the call to
checkAndRefreshOAuthTokenIfNeeded() (and the
configureApiKeyHeaders(defaultHeaders, getIsNonInteractiveSession()) call)
inside the branch that is creating/using the Anthropic client (i.e., only when
isClaudeAISubscriber() or the code path that instantiates the Anthropic/via-API
provider is true) so vLLM/local-model branches never trigger OAuth or
apiKeyHelper work; apply the same change to the duplicate token-check block
elsewhere in this file (the other block that currently invokes
checkAndRefreshOAuthTokenIfNeeded()/configureApiKeyHeaders()).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/services/api/vllm-fetch-adapter.ts`:
- Around line 475-478: The upstream finish reason (choices[0].finish_reason) is
captured into stopReason but then discarded because finishStream() always emits
only "tool_use" or "end_turn"; update the code paths where stopReason is passed
to finishStream() (and the other similar block around the 587-619 region) to
forward the actual finishReason string (or map it to a meaningful enum) so
callers can see "length", "stop_sequence", etc., instead of always receiving
"tool_use"/"end_turn"; ensure references to choices[0].finish_reason,
stopReason, and finishStream() are updated and that any consumers still handle
the existing values or new mapped ones.
- Around line 674-682: When proxying the upstream request for vLLM (the
globalThis.fetch call that produces vllmResponse), preserve the original
request's cancellation and transport options instead of rebuilding options from
scratch: merge the original init (or SDK fetch options) into the fetch call so
init.signal and any other transport-related fields are forwarded, merge/override
headers to ensure Content-Type/Accept/Authorization are set while preserving
other headers, and pass the body (vllmBody) and method through; locate the fetch
invocation that uses chatCompletionsUrl, vllmBody, and Authorization and replace
the hard-coded options with a merged Request/option object that includes
init.signal and other init fields so cancelled streams and SDK transport
configuration are preserved.
- Around line 210-218: The vllmBody currently only includes model, messages,
stream and tools, which drops Anthropic generation controls; update the code
that builds vllmBody (the block that sets vllmBody, using claudeModel, messages,
anthropicTools and translateTools) to forward Anthropic parameters when
present—copy max_tokens, temperature, stop_sequences, and tool_choice (or map to
the vLLM equivalents) into vllmBody, ensuring proper key names and types and
only adding them when defined so existing behavior isn’t broken.

In `@src/utils/auth.ts`:
- Around line 1640-1645: The new getVLLMApiKey()/isVLLMSubscriber() functions
are not wired into the provider-classification helpers, causing functions like
isAnthropicAuthEnabled, is1PApiCustomer, and isUsing3PServices to misreport for
vllm; update each of those functions to treat provider 'vllm' as appropriate
(e.g., include isVLLMSubscriber() or getAPIProvider() === 'vllm' in their
truthiness checks) so vllm sessions are classified consistently alongside other
3P/1P providers.

In `@src/utils/managedEnvConstants.ts`:
- Around line 155-156: Remove VLLM_BASE_URL from the SAFE_ENV_VARS whitelist so
it is no longer applied without a trust prompt; update the SAFE_ENV_VARS array
(the list that currently contains 'VLLM_API_KEY' and 'VLLM_BASE_URL') to only
include 'VLLM_API_KEY' and mirror the handling for ANTHROPIC_BASE_URL by
treating VLLM_BASE_URL as sensitive (i.e., not trusted by default) and add a
short comment explaining why VLLM_BASE_URL must remain excluded.

---

Outside diff comments:
In `@src/services/api/client.ts`:
- Around line 139-145: Only run Anthropic-specific auth/token work for Anthropic
consumers: move the call to checkAndRefreshOAuthTokenIfNeeded() (and the
configureApiKeyHeaders(defaultHeaders, getIsNonInteractiveSession()) call)
inside the branch that is creating/using the Anthropic client (i.e., only when
isClaudeAISubscriber() or the code path that instantiates the Anthropic/via-API
provider is true) so vLLM/local-model branches never trigger OAuth or
apiKeyHelper work; apply the same change to the duplicate token-check block
elsewhere in this file (the other block that currently invokes
checkAndRefreshOAuthTokenIfNeeded()/configureApiKeyHeaders()).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 435178e4-f3f6-46a9-b52d-b43a3a53b6e2

📥 Commits

Reviewing files that changed from the base of the PR and between 7dc15d6 and e8b287e.

📒 Files selected for processing (5)

src/services/api/client.ts
src/services/api/vllm-fetch-adapter.ts
src/utils/auth.ts
src/utils/managedEnvConstants.ts
src/utils/model/providers.ts

coderabbitai · 2026-04-10T08:49:52Z

src/services/api/vllm-fetch-adapter.ts

+  const vllmBody: Record<string, unknown> = {
+    model: claudeModel,
+    messages,
+    stream: true,
+  }
+
+  if (anthropicTools.length > 0) {
+    vllmBody.tools = translateTools(anthropicTools)
+  }


⚠️ Potential issue | 🟠 Major

Forward the Anthropic generation controls.

translateToVLLMBody() only sends model, messages, stream, and tools. Dropping max_tokens means the upstream server falls back to its default length, which can truncate or overrun completions relative to the Anthropic request; temperature, stop_sequences, and tool_choice are lost for the same reason.

Suggested mapping

const vllmBody: Record<string, unknown> = { model: claudeModel, messages, stream: true, + ...(typeof anthropicBody.max_tokens === 'number' + ? { max_tokens: anthropicBody.max_tokens } + : {}), + ...(typeof anthropicBody.temperature === 'number' + ? { temperature: anthropicBody.temperature } + : {}), + ...(Array.isArray(anthropicBody.stop_sequences) + ? { stop: anthropicBody.stop_sequences } + : {}), }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const vllmBody: Record<string, unknown> = {

model: claudeModel,

messages,

stream: true,

}

if (anthropicTools.length > 0) {

vllmBody.tools = translateTools(anthropicTools)

}

const vllmBody: Record<string, unknown> = {

model: claudeModel,

messages,

stream: true,

...(typeof anthropicBody.max_tokens === 'number'

? { max_tokens: anthropicBody.max_tokens }

: {}),

...(typeof anthropicBody.temperature === 'number'

? { temperature: anthropicBody.temperature }

: {}),

...(Array.isArray(anthropicBody.stop_sequences)

? { stop: anthropicBody.stop_sequences }

: {}),

}

if (anthropicTools.length > 0) {

vllmBody.tools = translateTools(anthropicTools)

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/api/vllm-fetch-adapter.ts` around lines 210 - 218, The vllmBody currently only includes model, messages, stream and tools, which drops Anthropic generation controls; update the code that builds vllmBody (the block that sets vllmBody, using claudeModel, messages, anthropicTools and translateTools) to forward Anthropic parameters when present—copy max_tokens, temperature, stop_sequences, and tool_choice (or map to the vLLM equivalents) into vllmBody, ensuring proper key names and types and only adding them when defined so existing behavior isn’t broken.

coderabbitai · 2026-04-10T08:49:52Z

src/services/api/vllm-fetch-adapter.ts

+            // ── Finish reason ───────────────────────────────────
+            if (choices[0].finish_reason) {
+              stopReason = choices[0].finish_reason as string
+            }


⚠️ Potential issue | 🟠 Major

Don't discard the upstream finish reason.

You capture choices[0].finish_reason into stopReason, but finishStream() ignores it and always emits tool_use or end_turn. That turns length and stop-sequence terminations into normal turns, so callers can't distinguish a truncated response from a clean stop.

Also applies to: 587-619

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/api/vllm-fetch-adapter.ts` around lines 475 - 478, The upstream finish reason (choices[0].finish_reason) is captured into stopReason but then discarded because finishStream() always emits only "tool_use" or "end_turn"; update the code paths where stopReason is passed to finishStream() (and the other similar block around the 587-619 region) to forward the actual finishReason string (or map it to a meaningful enum) so callers can see "length", "stop_sequence", etc., instead of always receiving "tool_use"/"end_turn"; ensure references to choices[0].finish_reason, stopReason, and finishStream() are updated and that any consumers still handle the existing values or new mapped ones.

coderabbitai · 2026-04-10T08:49:52Z

src/services/api/vllm-fetch-adapter.ts

+    const vllmResponse = await globalThis.fetch(chatCompletionsUrl, {
+      method: 'POST',
+      headers: {
+        'Content-Type': 'application/json',
+        Accept: 'text/event-stream',
+        Authorization: `Bearer ${apiKey || 'sk-placeholder'}`,
+      },
+      body: JSON.stringify(vllmBody),
+    })


⚠️ Potential issue | 🟠 Major

Preserve cancellation and transport options when proxying the fetch.

The upstream request is rebuilt from scratch here, so init.signal and any SDK-supplied transport options are dropped. Cancelled streams will keep running on the vLLM server, and proxy/agent configuration from the Anthropic client stops applying to the forwarded call.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/services/api/vllm-fetch-adapter.ts` around lines 674 - 682, When proxying the upstream request for vLLM (the globalThis.fetch call that produces vllmResponse), preserve the original request's cancellation and transport options instead of rebuilding options from scratch: merge the original init (or SDK fetch options) into the fetch call so init.signal and any other transport-related fields are forwarded, merge/override headers to ensure Content-Type/Accept/Authorization are set while preserving other headers, and pass the body (vllmBody) and method through; locate the fetch invocation that uses chatCompletionsUrl, vllmBody, and Authorization and replace the hard-coded options with a merged Request/option object that includes init.signal and other init fields so cancelled streams and SDK transport configuration are preserved.

coderabbitai · 2026-04-10T08:49:52Z

src/utils/auth.ts

+export function getVLLMApiKey(): string | undefined {
+  return process.env.VLLM_API_KEY || process.env.OPENAI_API_KEY
+}
+
+export function isVLLMSubscriber(): boolean {
+  return getAPIProvider() === 'vllm' && !!getVLLMApiKey()


⚠️ Potential issue | 🟠 Major

Wire vllm into the existing provider classification.

isVLLMSubscriber() is added here, but the rest of this file still hard-codes only Bedrock/Vertex/Foundry as 3P. Today isAnthropicAuthEnabled() (Line 116), is1PApiCustomer() (Line 1662), and isUsing3PServices() (Line 1808) can all report the wrong state for a vLLM session.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/auth.ts` around lines 1640 - 1645, The new getVLLMApiKey()/isVLLMSubscriber() functions are not wired into the provider-classification helpers, causing functions like isAnthropicAuthEnabled, is1PApiCustomer, and isUsing3PServices to misreport for vllm; update each of those functions to treat provider 'vllm' as appropriate (e.g., include isVLLMSubscriber() or getAPIProvider() === 'vllm' in their truthiness checks) so vllm sessions are classified consistently alongside other 3P/1P providers.

coderabbitai · 2026-04-10T08:49:52Z

src/utils/managedEnvConstants.ts

+  'VLLM_API_KEY',
+  'VLLM_BASE_URL',


⚠️ Potential issue | 🔴 Critical

Don't whitelist VLLM_BASE_URL as a safe env var.

SAFE_ENV_VARS are applied without a trust prompt, and VLLM_BASE_URL chooses the server that receives prompts and tool traffic. That gives managed/project settings the same redirect capability the file already treats as dangerous for ANTHROPIC_BASE_URL.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/utils/managedEnvConstants.ts` around lines 155 - 156, Remove VLLM_BASE_URL from the SAFE_ENV_VARS whitelist so it is no longer applied without a trust prompt; update the SAFE_ENV_VARS array (the list that currently contains 'VLLM_API_KEY' and 'VLLM_BASE_URL') to only include 'VLLM_API_KEY' and mirror the handling for ANTHROPIC_BASE_URL by treating VLLM_BASE_URL as sensitive (i.e., not trusted by default) and add a short comment explaining why VLLM_BASE_URL must remain excluded.

chatgpt-codex-connector bot reviewed Apr 10, 2026

View reviewed changes

coderabbitai bot reviewed Apr 10, 2026

View reviewed changes

		if (!url.includes('/v1/messages')) {
		return globalThis.fetch(input, init)

Conversation

Sevenal commented Apr 10, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sevenal commented Apr 10, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 10, 2026 •

edited

Loading