InfiniteLLM Gateway

InfiniteLLM Gateway is a high-performance, developer-friendly LLM API proxy that aggregates and load-balances multiple free-tier LLM providers (Groq, Cerebras, Mistral, OpenRouter, and Gemini Native). It exposes a single OpenAI-compatible endpoint with automatic failover and streaming support.

Key Features

OpenAI Compatible: Fully compatible with OpenAI SDKs and tools via /v1/chat/completions.
Native Gemini Support: Adapts the native Gemini API to the OpenAI format.
Intelligent Load Balancing: Round-robin distribution across providers.
Robust Failover: Automatically retries the next available provider if one fails or returns a 429 Too Many Requests or 5xx Server Error.
Circuit Breaker: Temporarily disables providers that repeatedly fail (configurable threshold and exponential backoff).
Streaming Support: Transparent proxying of Server-Sent Events (SSE) for real-time model responses.
Production-Ready HTTP Server: Configured timeouts (read, write, idle) and request body limits.
Request Tracing: Automatic X-Request-ID generation/propagation for distributed tracing.
Contract-First Development: API types and server interfaces are generated from an OpenAPI 3.0 specification.
Developer Ready: Includes local debugging configurations and a full verification suite.

Tech Stack

Language: Go 1.25.6
Router: go-chi/chi
Code Generation: oapi-codegen (Strict Server mode)
CI/CD: GitHub Actions with security auditing (Gosec, Govulncheck) and GHCR publishing
Testing: Native Go tests with high coverage targets.

Architecture

The project follows a clean, modular architecture inspired by Domain-Driven Design (DDD):

/api: Contains the OpenAPI specifications. openai_proxy.yml is the optimized version for generating Go types.
/pkg/api: Auto-generated boilerplate (routing, JSON decoding/encoding). Do not edit manually.
/pkg/balancer: Core logic for provider selection, round-robin state, and retry policies.
/pkg/handlers: HTTP handlers for health, JSON stats, and the web dashboard using Go's embed and html/template.
/pkg/metrics: Asynchronous metrics collection with SQLite persistence for request stats.
/pkg/provider: Implementation of various LLM adapters (Groq, Mistral, Gemini, etc.).
main.go: Implements the StrictServerInterface, orchestrates the bootstrap process, and handles the reverse proxy logic.

Getting Started

Prerequisites

Go 1.25.6 or higher.
A .env file in the root directory.

Environment Variables

Create a .env file with your provider keys:

PORT=8080
GROQ_API_KEY=your_key
CEREBRAS_API_KEY=your_key
OPENROUTER_API_KEY=your_key
MISTRAL_API_KEY=your_key
GEMINI_API_KEY=your_key

# Optional Debug Flags
LOG_LLM_RESPONSE_DETAILS=true  # Log full upstream response body
FIXED_PROVIDER=Gemini          # Force routing to a specific provider

# Metrics (optional)
METRICS_DB_PATH=metrics.db     # SQLite database path for metrics persistence
METRICS_RETENTION_DAYS=30      # How many days to keep metrics (default: 30)

# Circuit Breaker (optional)
CIRCUIT_FAILURE_THRESHOLD=3           # Consecutive failures to trip breaker (default: 3)
CIRCUIT_COOLDOWN_BASE_SECONDS=30      # Initial cooldown duration (default: 30)
CIRCUIT_MAX_COOLDOWN_SECONDS=300      # Maximum cooldown with exponential backoff (default: 300)

# Server Hardening (optional)
MAX_REQUEST_BODY_BYTES=10485760       # Max request body size in bytes (default: 10MB)

Running the Gateway

# Install dependencies
go mod tidy

# Run the server
go run main.go

The gateway will be available at http://localhost:8080/v1/chat/completions.

Health Endpoint

A /health endpoint is available for Kubernetes liveness/readiness probes:

curl http://localhost:8080/health
# Returns: {"status":"ok"}

Stats and Dashboard

The gateway provides both raw JSON metrics and a visual dashboard:

1. Visual Dashboard

Access a real-time dashboard at http://localhost:8080/stats/web.

Auto-Refresh: Use the refresh query parameter (e.g., /stats/web?refresh=5) to automatically reload the dashboard every N seconds.
Rich UI: Embedded templates provide fast, frame-free visualization of your gateway's health.

2. JSON Metrics

A /stats endpoint provides aggregated metrics for programmatic access:

curl http://localhost:8080/stats

Returns statistics including:

Total requests, successes, and failures.
Average, min, max response times (in milliseconds).
Per-provider breakdown with error counts (429, 5xx, 4xx).
Success rate percentage and "Stats Since" timestamp.

Response Headers

Every response from /v1/chat/completions includes:

X-Provider: Name of the LLM provider that handled the request
X-Response-Time-Ms: Response time in milliseconds

Docker

Build and run the containerized gateway using the optimized, scratch-based image:

# Build
docker build -t infinitellm .

# Run
docker run -p 8080:8080 --env-file .env infinitellm

The image is automatically built and published to GitHub Container Registry (GHCR) on every push to main.

Development Workflow

Verification Suite

Run the full verification script (format, lint, test):

# Windows
.\scripts\verify.ps1

Manual Testing

Use curl to test the status:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-1.5-flash",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Generating API Code

If you modify api/openai_proxy.yml, regenerate the Go code using:

go run github.com/oapi-codegen/oapi-codegen/v2/cmd/oapi-codegen@latest \
  -config oapi-config.yaml api/openai_proxy.yml

Security and Quality

Every commit is validated against:

Linting: golangci-lint (v2.7.2).
Vulnerabilities: govulncheck and gosec.
Tests: Race condition detection enabled.

Debugging

A VS Code launch.json is provided with:

Debug InfiniteLLM Gateway: Launches the app with .env loaded.
Test Current Function: Allows debugging a specific test function by selecting its name.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.agent/workflows		.agent/workflows
.github		.github
.vscode		.vscode
api		api
pkg		pkg
scripts		scripts
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
README.md		README.md
codecov.yml		codecov.yml
config.yaml		config.yaml
e2e_test.go		e2e_test.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_test.go		main_test.go
oapi-config.yaml		oapi-config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfiniteLLM Gateway

Key Features

Tech Stack

Architecture

Getting Started

Prerequisites

Environment Variables

Running the Gateway

Health Endpoint

Stats and Dashboard

1. Visual Dashboard

2. JSON Metrics

Response Headers

Docker

Development Workflow

Verification Suite

Manual Testing

Generating API Code

Security and Quality

Debugging

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InfiniteLLM Gateway

Key Features

Tech Stack

Architecture

Getting Started

Prerequisites

Environment Variables

Running the Gateway

Health Endpoint

Stats and Dashboard

1. Visual Dashboard

2. JSON Metrics

Response Headers

Docker

Development Workflow

Verification Suite

Manual Testing

Generating API Code

Security and Quality

Debugging

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages