-
Notifications
You must be signed in to change notification settings - Fork 251
Description
Problem
provider-openai has migrated entirely to the OpenAI Responses API (/v1/responses). This breaks compatibility with all local servers that implement only the OpenAI Chat Completions API (/v1/chat/completions), including:
- mlx_lm.server (MLX framework)
- vLLM
- llama.cpp server
- LM Studio
- LocalAI
- Ollama's OpenAI compatibility mode
These servers are commonly used as base_url overrides in provider-openai to run local models.
Reproduction
# settings.yaml - local MLX provider
- config:
api_key: local
base_url: http://localhost:8080/v1
default_model: shieldstackllc/Step-3.5-Flash-REAP-128B-A11B-mlx-mixed-4-6
priority: 4
instance_id: local
module: provider-openaiWhen the routing matrix selects this provider, the session crashes with:
[PROVIDER] OpenAI API error: ReadError: (no message)
Error: Execution failed: LLMError: ReadError: (no message)
The ReadError occurs because provider-openai calls self.client.responses.stream() (line ~952) or self.client.responses.create() (line ~965), hitting /v1/responses which returns 404 on the local server.
Direct curl to /v1/chat/completions on the same server works perfectly:
curl http://localhost:8080/v1/chat/completions \
-d '{"model":"...","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
# Returns 200 with valid responseRoot Cause
In provider-openai/__init__.py around line 947-965:
if self.use_streaming:
async with self.client.responses.stream(**params) as stream: # /v1/responses
response = await stream.get_final_response()
else:
return await self.client.responses.create(**params) # also /v1/responsesBoth streaming and non-streaming paths use the Responses API. There is no Chat Completions fallback.
Proposed Fix
Add a config option like use_responses_api: false (or auto-detect based on base_url being non-default) that falls back to self.client.chat.completions.create() when targeting local/compatible servers.
This would restore the local model use case that previously worked when provider-openai used Chat Completions.
Secondary issue: CLI -p flag doesn't support instance_id
Related: amplifier run -p local fails because the CLI resolves -p against the module field (finding provider-openai), not instance_id. When two providers share the same module (cloud OpenAI + local MLX), there's no way to target the second instance from the CLI.
Impact
Any user with local MLX/vLLM/llama.cpp models configured via base_url in provider-openai is broken. The routing matrix's local fallback candidates never work, giving a false sense of offline resilience.