Skip to content

provider-openai Responses API breaks compatibility with local OpenAI-compatible servers (MLX, vLLM, etc.) #246

@Joi

Description

@Joi

Problem

provider-openai has migrated entirely to the OpenAI Responses API (/v1/responses). This breaks compatibility with all local servers that implement only the OpenAI Chat Completions API (/v1/chat/completions), including:

  • mlx_lm.server (MLX framework)
  • vLLM
  • llama.cpp server
  • LM Studio
  • LocalAI
  • Ollama's OpenAI compatibility mode

These servers are commonly used as base_url overrides in provider-openai to run local models.

Reproduction

# settings.yaml - local MLX provider
- config:
    api_key: local
    base_url: http://localhost:8080/v1
    default_model: shieldstackllc/Step-3.5-Flash-REAP-128B-A11B-mlx-mixed-4-6
    priority: 4
  instance_id: local
  module: provider-openai

When the routing matrix selects this provider, the session crashes with:

[PROVIDER] OpenAI API error: ReadError: (no message)
Error: Execution failed: LLMError: ReadError: (no message)

The ReadError occurs because provider-openai calls self.client.responses.stream() (line ~952) or self.client.responses.create() (line ~965), hitting /v1/responses which returns 404 on the local server.

Direct curl to /v1/chat/completions on the same server works perfectly:

curl http://localhost:8080/v1/chat/completions \
  -d '{"model":"...","messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
# Returns 200 with valid response

Root Cause

In provider-openai/__init__.py around line 947-965:

if self.use_streaming:
    async with self.client.responses.stream(**params) as stream:  # /v1/responses
        response = await stream.get_final_response()
else:
    return await self.client.responses.create(**params)  # also /v1/responses

Both streaming and non-streaming paths use the Responses API. There is no Chat Completions fallback.

Proposed Fix

Add a config option like use_responses_api: false (or auto-detect based on base_url being non-default) that falls back to self.client.chat.completions.create() when targeting local/compatible servers.

This would restore the local model use case that previously worked when provider-openai used Chat Completions.

Secondary issue: CLI -p flag doesn't support instance_id

Related: amplifier run -p local fails because the CLI resolves -p against the module field (finding provider-openai), not instance_id. When two providers share the same module (cloud OpenAI + local MLX), there's no way to target the second instance from the CLI.

Impact

Any user with local MLX/vLLM/llama.cpp models configured via base_url in provider-openai is broken. The routing matrix's local fallback candidates never work, giving a false sense of offline resilience.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions