A blazing-fast, zero-dependency LLM proxy gateway. Drop it in front of any LLM provider and get caching, metrics, retries, and observability — without changing your application code.
# 1. Clone
git clone https://github.com/llmsocket/llmsocket-gateway && cd llmsocket-gateway
# 2. Configure (copy template, add your keys)
cp .env.example .env
# Edit .env — add at least one provider key
# 3. Run
go run ./cmd/gateway
# 4. Test
curl http://localhost:4000/health
curl http://localhost:4000/readyThe banner shows exactly which providers were detected:
██╗ ██╗ ███╗ ███╗███████╗ ██████╗ ██████╗██╗ ██╗███████╗████████╗
██║ ██║ ████╗ ████║██╔════╝██╔═══██╗██╔════╝██║ ██╔╝██╔════╝╚══██╔══╝
██║ ██║ ██╔████╔██║███████╗██║ ██║██║ █████╔╝ █████╗ ██║
██║ ██║ ██║╚██╔╝██║╚════██║██║ ██║██║ ██╔═██╗ ██╔══╝ ██║
███████╗███████╗██║ ╚═╝ ██║███████║╚██████╔╝╚██████╗██║ ██╗███████╗ ██║
╚══════╝╚══════╝╚═╝ ╚═╝╚══════╝ ╚═════╝ ╚═════╝╚═╝ ╚═╝╚══════╝ ╚═╝
AI Gateway · vdev · development
─────────────────────────────────────
Gateway → http://localhost:4000
Metrics → http://localhost:4000/metrics
Dashboard → http://localhost:4000/ui
Config → http://localhost:4000/config
Providers → anthropic, groq, openai
Cache → true (TTL: 1h0m0s, max: 10000)
Retry → true (max 2 retries on [429 502 503 504])
All config lives in .env (or real environment variables — env wins over .env).
cp .env.example .env
# Edit .env| Variable | Default | Description |
|---|---|---|
GATEWAY_PORT |
4000 |
HTTP listen port |
GATEWAY_ENV |
development |
development or production (JSON logs) |
GATEWAY_DEBUG |
false |
Verbose debug logging |
GATEWAY_TIMEOUT_MS |
120000 |
Upstream timeout in milliseconds |
Add whichever providers you use. Unset providers are simply not registered.
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GEMINI_API_KEY=AIza...
GROQ_API_KEY=gsk_...
MISTRAL_API_KEY=...
TOGETHER_API_KEY=...
DEEPSEEK_API_KEY=...
XAI_API_KEY=xai-...
FIREWORKS_API_KEY=...
PERPLEXITY_API_KEY=pplx-...
COHERE_API_KEY=...
OPENROUTER_API_KEY=sk-or-...
# Azure OpenAI
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://YOUR_RESOURCE.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4o
# Ollama (local, no key needed)
OLLAMA_BASE_URL=http://localhost:11434Override the base URL for any provider (useful for proxies, private deployments):
GATEWAY_OPENAI_BASE_URL=https://my-proxy.example.com
GATEWAY_ANTHROPIC_BASE_URL=https://anthropic.corp.internalGATEWAY_CACHE_ENABLED=true
GATEWAY_CACHE_TTL_SECONDS=3600
GATEWAY_CACHE_MAX_SIZE=10000
GATEWAY_RATELIMIT_ENABLED=false
GATEWAY_RATELIMIT_RPM=1000
GATEWAY_RETRY_ENABLED=true
GATEWAY_RETRY_MAX=2Each provider gets a dedicated path prefix. Just set base_url in your client:
| Provider | Gateway path | Example |
|---|---|---|
| OpenAI | /openai/v1 |
http://localhost:4000/openai/v1 |
| Anthropic | /anthropic/v1 |
http://localhost:4000/anthropic/v1 |
| Groq | /groq/openai/v1 |
http://localhost:4000/groq/openai/v1 |
| Mistral | /mistral/v1 |
http://localhost:4000/mistral/v1 |
| DeepSeek | /deepseek/v1 |
http://localhost:4000/deepseek/v1 |
| Gemini | /gemini/... |
http://localhost:4000/gemini |
| Ollama | /ollama/v1 |
http://localhost:4000/ollama/v1 |
| OpenRouter | /openrouter/api/v1 |
http://localhost:4000/openrouter/api/v1 |
Route to any provider using a header — your app uses a single base URL:
curl http://localhost:4000/v1/chat/completions \
-H "x-gateway-provider: groq" \
-d '{"model":"llama3-70b-8192","messages":[...]}'Override the server-configured key for a single request:
curl http://localhost:4000/openai/v1/chat/completions \
-H "x-gateway-api-key: sk-SPECIFIC-KEY" \
-d '...'# Before
from openai import OpenAI
client = OpenAI(api_key="sk-...")
# After — everything else stays identical
from openai import OpenAI
client = OpenAI(base_url="http://localhost:4000/openai")// TypeScript
const client = new OpenAI({ baseURL: "http://localhost:4000/openai" });# curl
curl http://localhost:4000/openai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'| Endpoint | Description |
|---|---|
GET /health |
Always returns 200 if the server is running |
GET /ready |
200 if ≥1 provider configured, 503 otherwise |
GET /config |
Active config (no secrets) |
GET /metrics |
Prometheus text format |
GET /metrics/json |
Full metrics as JSON |
GET /ui |
Live dashboard (HTML) |
# prometheus.yml
scrape_configs:
- job_name: gateway
static_configs:
- targets: ['localhost:4000']
metrics_path: /metricsKey metrics exposed:
gateway_requests_total— total proxied requestsgateway_errors_total— total 4xx/5xx responsesgateway_cache_hits_total— cache hit countgateway_tokens_total{provider, model, type}— token usagegateway_cost_usd_total{provider, model}— estimated USD costgateway_upstream_latency_p99_ms{provider, model}— P99 upstream latencygateway_gateway_overhead_ms{provider, model}— gateway-added latency
Each request is appended to requests.jsonl (configurable):
{"ts":"2025-01-15T10:23:45Z","request_id":"abc123","provider":"openai","model":"gpt-4o-mini","status":200,"latency_ms":342,"tokens_in":120,"tokens_out":80,"cache_hit":false}All gateway errors follow a consistent structure:
{
"error": {
"code": "missing_api_key",
"message": "no API key configured for provider \"groq\"",
"provider": "groq",
"request_id": "req_abc123",
"hint": "set GROQ_API_KEY in your .env file or environment, or pass the x-gateway-api-key header"
}
}| Error code | Cause |
|---|---|
missing_api_key |
No key for the requested provider |
unknown_provider |
Provider name not recognized |
upstream_error |
Network/timeout error calling the provider |
invalid_provider_url |
Malformed base URL in config |
request_body_read_error |
Could not read the request body |
Config errors (startup) include field, got, and hint:
╔═══════════════════════════════════════╗
║ Configuration Error ║
╚═══════════════════════════════════════╝
✗ must be a valid integer (field: GATEWAY_PORT, got: "abc")
💡 example: GATEWAY_PORT=4000
See examples/ directory:
curl_examples.sh— bash/curl quick testspython_openai.py— Python (OpenAI SDK, streaming, per-key override, Ollama)node_example.js— Node.js (OpenAI SDK)
# Python
pip install openai
python examples/python_openai.py openai
python examples/python_openai.py stream
python examples/python_openai.py health
# curl
bash examples/curl_examples.shdocker-compose up
# Or with your .env file
docker run --env-file .env -p 4000:4000 gateway:latestClient → GATEWAY → Provider (OpenAI / Anthropic / Groq / ...)
│
├── .env loader (with line-level error reporting)
├── LRU cache (in-memory, zero deps)
├── Retry with exponential backoff
├── Token extraction (all provider formats)
├── Cost estimation
├── Prometheus metrics
└── JSONL request log
| Provider | Path | Key env var |
|---|---|---|
| OpenAI | /openai |
OPENAI_API_KEY |
| Anthropic | /anthropic |
ANTHROPIC_API_KEY |
| Google Gemini | /gemini |
GEMINI_API_KEY |
| Groq | /groq |
GROQ_API_KEY |
| Mistral | /mistral |
MISTRAL_API_KEY |
| Together AI | /together |
TOGETHER_API_KEY |
| DeepSeek | /deepseek |
DEEPSEEK_API_KEY |
| xAI / Grok | /xai |
XAI_API_KEY |
| Fireworks | /fireworks |
FIREWORKS_API_KEY |
| Perplexity | /perplexity |
PERPLEXITY_API_KEY |
| Cohere | /cohere |
COHERE_API_KEY |
| OpenRouter | /openrouter |
OPENROUTER_API_KEY |
| Azure OpenAI | /azure |
AZURE_OPENAI_API_KEY |
| AWS Bedrock | /bedrock |
AWS_ACCESS_KEY_ID |
| Anyscale | /anyscale |
ANYSCALE_API_KEY |
| Ollama (local) | /ollama |
OLLAMA_BASE_URL |



