actually-lite-llm

A stateless Go LLM proxy gateway with an OpenAI-compatible API. Routes requests to OpenAI or Anthropic, translates formats transparently, and exposes per-app Prometheus metrics — without a database, UI, or 800MB of idle RAM.

Why

LiteLLM does what this does, but it weighs ~800MB at idle and can't track usage per virtual key without a database. This project replaces it with a single stateless binary that fits in a sidecar.

Features

OpenAI-compatible API — use the standard OpenAI SDK, point base_url at the gateway
Multi-provider routing — alias map + prefix rules route models to OpenAI or Anthropic
Format translation — Anthropic's Messages API is translated transparently (system prompt extraction, stop sequences, token fields, streaming SSE chunks)
Virtual API keys — per-app keys defined in YAML; unknown keys get 401, disallowed models get 403
Prometheus metrics — llm_requests_total, llm_tokens_total, llm_request_duration_seconds, llm_stream_first_byte_seconds, llm_provider_errors_total, llm_cost_dollars_total — all labeled by virtual_key, provider, model
Structured access logs — one-liner text via log/slog, including virtual key, provider, model, duration
Stateless — horizontally scalable; no shared state between replicas
Small — < 50MB idle RAM; distroless container image

Quick Start

cp config.example.yaml config.local.yaml
# edit config.local.yaml — add your provider API keys and virtual keys

go run . -config config.local.yaml

Test it:

curl http://localhost:8080/health

curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer your-virtual-key" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"hello"}]}'

Configuration

listen: ":8080"

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"        # env var interpolation supported
    base_url: "https://api.openai.com/v1"   # optional
  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    base_url: "https://api.anthropic.com/v1" # optional

models:
  # Alias → upstream model + provider
  "gpt-4o":        { provider: "openai",     model: "gpt-4o" }
  "claude-sonnet": { provider: "anthropic",  model: "claude-sonnet-4-20250514" }
  "claude-haiku":  { provider: "anthropic",  model: "claude-haiku-4-5-20251001" }

routing:
  # Prefix-based fallback for models not in the alias map
  - prefix: "gpt-"    # any model starting with gpt- → OpenAI
    provider: "openai"
  - prefix: "claude-" # any model starting with claude- → Anthropic
    provider: "anthropic"

api_keys:
  - key: "${VKEY_BACKEND}"
    app: "my-backend"
    allowed_models: ["*"]      # wildcard = all models
  - key: "${VKEY_CHATBOT}"
    app: "chatbot"
    allowed_models: ["claude-sonnet", "gpt-4o"]

All ${} values are expanded from environment variables at startup.

Endpoints

Method	Path	Description
`POST`	`/v1/chat/completions`	OpenAI-compatible chat completion (streaming + non-streaming)
`POST`	`/v1/responses`	OpenAI Responses API (streaming + non-streaming; translates to chat completions internally)
`GET`	`/v1/models`	List models the caller's API key is permitted to use
`GET`	`/health`	Liveness/readiness check
`GET`	`/metrics`	Prometheus metrics

Prometheus Metrics

Metric	Type	Labels
`llm_requests_total`	Counter	`virtual_key`, `provider`, `model`, `status`
`llm_request_duration_seconds`	Histogram	`virtual_key`, `provider`, `model`, `stream`
`llm_tokens_total`	Counter	`virtual_key`, `provider`, `model`, `direction`
`llm_stream_first_byte_seconds`	Histogram	`virtual_key`, `provider`, `model`
`llm_provider_errors_total`	Counter	`provider`, `error_type`
`llm_cost_dollars_total`	Counter	`virtual_key`, `provider`, `model`

Releases

Docker images and Helm charts are published automatically to GHCR on every v* tag push.

Docker

docker pull ghcr.io/im0rtality/actually-lite-llm:0.1.0
# or latest patch of a minor
docker pull ghcr.io/im0rtality/actually-lite-llm:0.1

Run it:

docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -v $(pwd)/config.yaml:/etc/actually-lite-llm/config.yaml \
  ghcr.io/im0rtality/actually-lite-llm:0.1.0 \
  -config /etc/actually-lite-llm/config.yaml

Helm chart (OCI)

helm install gateway oci://ghcr.io/im0rtality/charts/actually-lite-llm --version 0.1.0

With values:

helm install gateway oci://ghcr.io/im0rtality/charts/actually-lite-llm \
  --version 0.1.0 \
  --set image.tag=0.1.0 \
  -f my-values.yaml

Or install from a local values file:

# my-values.yaml
config:
  inline: |
    listen: ":8080"
    providers:
      openai:
        api_key: "${OPENAI_API_KEY}"
    ...

Enable the Prometheus ServiceMonitor (requires kube-prometheus-stack):

serviceMonitor:
  enabled: true
  interval: 30s

Kubernetes / Helm (local chart)

helm install gateway ./helm/actually-lite-llm \
  --set config.existingSecret=my-gateway-config \
  --set extraEnv[0].name=OPENAI_API_KEY \
  --set extraEnv[0].valueFrom.secretKeyRef.name=provider-keys \
  --set extraEnv[0].valueFrom.secretKeyRef.key=openai

Docker (local build)

docker build -t actually-lite-llm .
docker run -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... \
  -v $(pwd)/config.yaml:/etc/actually-lite-llm/config.yaml \
  actually-lite-llm -config /etc/actually-lite-llm/config.yaml

Development

go test ./...
go build .
go vet ./...

Planned Features

Google Gemini provider
Additional providers (Mistral, Groq, etc.)

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
docs		docs
helm/actually-lite-llm		helm/actually-lite-llm
internal		internal
.gitignore		.gitignore
.golangci-lint-version		.golangci-lint-version
Dockerfile		Dockerfile
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

actually-lite-llm

Why

Features

Quick Start

Configuration

Endpoints

Prometheus Metrics

Releases

Docker

Helm chart (OCI)

Kubernetes / Helm (local chart)

Docker (local build)

Development

Planned Features

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

actually-lite-llm

Why

Features

Quick Start

Configuration

Endpoints

Prometheus Metrics

Releases

Docker

Helm chart (OCI)

Kubernetes / Helm (local chart)

Docker (local build)

Development

Planned Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages