API endpoints for each ATLAS service. All services communicate over HTTP/JSON. Streaming endpoints use Server-Sent Events (SSE).
Ports listed are defaults. All are configurable via environment variables (see CONFIGURATION.md). Docker Compose maps container-internal ports to host ports — the ports below are what you hit from the host.
The main entry point. Wraps llama-server with an agent loop, grammar-constrained tool calls, Lens scoring, and sandbox verification. This is what Aider connects to.
OpenAI-compatible chat completions. When ATLAS_AGENT_LOOP=1 (default), the proxy runs an internal agent loop with structured tool calls instead of forwarding directly to the LLM.
Request:
{
"model": "Qwen3.5-9B-Q6_K",
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "Create a Python hello world script"}
],
"max_tokens": 32768,
"temperature": 0.3,
"stream": true,
"stop": []
}Response (SSE stream when stream: true):
data: {"id":"atlas-verify","object":"chat.completion.chunk","choices":[{"delta":{"content":"[Turn 1/30] writing hello.py..."}}]}
data: {"id":"atlas-verify","object":"chat.completion.chunk","choices":[{"delta":{"content":"hello.py\n```python\nprint('hello world')\n```"}}]}
data: [DONE]
Response (non-streaming when stream: false):
{
"id": "atlas-verify",
"object": "chat.completion",
"created": 1712345678,
"model": "Qwen3.5-9B-Q6_K",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "..."},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 150,
"completion_tokens": 200,
"total_tokens": 350
},
"atlas_route": "standard",
"atlas_gx_score": 0.85,
"atlas_verdict": "likely_correct",
"atlas_sandbox_passed": true,
"atlas_repair_attempt": 0
}The atlas_* fields are ATLAS-specific metadata attached to non-streaming responses. They are omitted when empty/zero.
Example:
curl -N http://localhost:8090/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"Qwen3.5-9B-Q6_K","messages":[{"role":"user","content":"hi"}],"max_tokens":100,"stream":true}'Tool-based agent endpoint. Sends a message and receives a stream of tool calls and results.
Request:
{
"message": "Create a snake game in Python",
"working_dir": "/path/to/project",
"mode": "default"
}mode options: "default", "accept-edits", "yolo"
Response: SSE stream of tool call events, terminated by data: [DONE].
Returns available models (OpenAI-compatible).
curl http://localhost:8090/v1/modelscurl http://localhost:8090/health{
"status": "ok",
"inference": true,
"lens": true,
"sandbox": true,
"port": "8090",
"stats": {
"requests": 42,
"repairs": 3,
"sandbox_passes": 38,
"sandbox_fails": 4
}
}Note:
/chat/completionsand/models(without/v1/prefix) are aliases for their/v1/counterparts. Any unmatched path is proxied directly to llama-server.
Runs the full V3 code generation pipeline: probe, PlanSearch, DivSampling, Budget Forcing, Lens scoring, sandbox testing, and Phase 3 repair.
Run the V3 pipeline for a file generation task. Streams progress events as SSE.
Request:
{
"file_path": "app/page.tsx",
"baseline_code": "export default function Page() { ... }",
"project_context": {"package.json": "{...}", "tsconfig.json": "{...}"},
"framework": "nextjs",
"build_command": "npx next build",
"constraints": ["Must use Tailwind CSS", "Must be a client component"],
"tier": 2,
"working_dir": "/path/to/project"
}All fields are optional except the task itself. tier defaults to 2.
Response (SSE stream):
data: {"stage": "probe", "detail": "Generating probe candidate..."}
data: {"stage": "probe_scored", "detail": "C(x)=0.72 norm=0.68"}
data: {"stage": "probe_sandbox", "detail": "Testing probe..."}
data: {"stage": "plansearch", "detail": "Generating 3 plans..."}
data: {"stage": "divsampling", "detail": "Generating candidates..."}
data: {"stage": "sandbox_test", "detail": "Testing 3 candidates..."}
data: {"stage": "sandbox_pass", "detail": "Candidate 1 passed"}
event: result
data: {"code": "...", "passed": true, "phase_solved": "phase1", "candidates_tested": 3, "winning_score": 0.85, "total_tokens": 12500, "total_time_ms": 4200.0}
data: [DONE]
All SSE stage values
The pipeline emits stages as it progresses. Not all stages appear in every run — the pipeline exits early when a candidate passes.
| Phase | Stages |
|---|---|
| Probe (Phase 0) | probe, probe_light, probe_error, probe_retry, probe_failed, self_test_gen, self_test_done, self_test_error, self_test_verify, probe_scored, probe_sandbox, probe_pass |
| Generation (Phase 1) | phase1, phase2 (allocation), phase2_allocated, plansearch, plansearch_done, plansearch_error, divsampling, divsampling_done, divsampling_error |
| Testing (Phase 2) | sandbox_test, sandbox_pass, sandbox_done, s_star, s_star_winner, s_star_error, selected |
| Repair (Phase 3) | phase3, pr_cot, pr_cot_pass, pr_cot_failed, pr_cot_error, refinement, refinement_pass, refinement_failed, refinement_error, derivation, derivation_pass, derivation_failed, derivation_error, fallback |
Simplified endpoint for running the pipeline on a problem description (used by the CLI).
Request:
{
"problem": "Write a function that finds the longest palindromic substring",
"task_id": "cli",
"stream": true,
"files": {"main.py": "# existing code..."}
}Response: Same SSE format as /v3/generate.
curl http://localhost:8070/health
# {"status": "ok", "service": "v3-pipeline"}Energy-based code scoring using C(x) cost field and G(x) quality prediction. Also serves as the RAG API for project indexing and retrieval.
Internal port note: The container runs uvicorn on port 8001. Docker Compose maps this to 8099 on the host.
Score code using combined C(x) + G(x) energy. Single embedding extraction serves both models.
Request:
{"text": "def merge_sort(arr):\n if len(arr) <= 1:\n return arr\n ..."}Response:
{
"cx_energy": 5.2,
"cx_normalized": 0.32,
"gx_score": 0.85,
"verdict": "likely_correct",
"gx_available": true,
"enabled": true,
"latency_ms": 26.4
}| Field | Type | Description |
|---|---|---|
cx_energy |
float | Raw cost field energy (lower = more likely correct) |
cx_normalized |
float | 0–1 normalized energy |
gx_score |
float | 0–1 probability of correctness (G(x) model) |
verdict |
string | "likely_correct", "uncertain", or "likely_incorrect" |
gx_available |
bool | Whether the G(x) model was loaded |
enabled |
bool | Whether Geometric Lens is enabled |
latency_ms |
float | Execution time in milliseconds |
When Lens is disabled, returns enabled: false with neutral defaults (cx_energy: 0.0, gx_score: 0.5).
Example:
curl http://localhost:8099/internal/lens/gx-score \
-H "Content-Type: application/json" \
-d '{"text": "print(\"hello world\")"}'curl http://localhost:8099/health
# {"status": "healthy", "service": "geometric-lens"}Additional internal endpoints
These are used internally by other ATLAS services. They are stable but not part of the public API.
| Endpoint | Method | Description |
|---|---|---|
/v1/projects/sync |
POST | Sync/index a project codebase |
/v1/projects/{id}/status |
GET | Get project index status |
/v1/projects |
GET | List indexed projects |
/v1/projects/{id} |
DELETE | Delete a project index |
/v1/chat/completions |
POST | RAG-augmented chat completions |
/v1/models |
GET | List available models |
/v1/tasks/submit |
POST | Submit async task |
/v1/tasks/{id}/status |
GET | Get task status |
/v1/queue/stats |
GET | Task queue statistics |
/v1/patterns/write |
POST | Write pattern data |
/internal/cache/stats |
GET | Cache statistics |
/internal/cache/flush |
POST | Flush cache |
/internal/cache/consolidate |
POST | Consolidate cache entries |
/internal/router/stats |
GET | Confidence router statistics |
/internal/router/reset |
POST | Reset router posteriors |
/internal/router/feedback |
POST | Record routing feedback |
/internal/lens/stats |
GET | Lens model statistics |
/internal/lens/evaluate |
GET/POST | Evaluate text through Lens |
/internal/lens/score-text |
POST | Score text (C(x) only) |
/internal/lens/retrain |
POST | Retrain cost field model |
/internal/lens/reload |
POST | Reload model weights |
/internal/lens/correctability |
POST | Correctability evaluation |
/internal/sandbox/analyze |
POST | Sandbox result analysis |
Isolated code execution with compilation, testing, and linting support.
Execute code in an isolated environment.
Request:
{
"code": "print('hello from sandbox')",
"language": "python",
"test_code": null,
"requirements": null,
"timeout": 30
}| Field | Type | Default | Description |
|---|---|---|---|
code |
string | (required) | Code to execute |
language |
string | "python" |
Target language (see supported list below) |
test_code |
string | null | Optional test code (e.g. pytest assertions) |
requirements |
string[] | null | Python packages to pip install before execution |
timeout |
int | 30 | Max execution time in seconds (capped at 60) |
Response:
{
"success": true,
"compile_success": true,
"tests_run": 1,
"tests_passed": 1,
"lint_score": 8.5,
"stdout": "hello from sandbox\n",
"stderr": "",
"error_type": null,
"error_message": null,
"execution_time_ms": 45
}| Field | Type | Description |
|---|---|---|
success |
bool | Overall pass/fail |
compile_success |
bool | Whether compilation succeeded (always true for interpreted languages) |
tests_run |
int | Number of tests executed |
tests_passed |
int | Number of tests that passed |
lint_score |
float? | Pylint score 0–10 (Python only, null for other languages) |
stdout |
string | Stdout output (truncated to last 4000 chars) |
stderr |
string | Stderr output (truncated to last 2000 chars) |
error_type |
string? | Error classification (e.g. SyntaxError, CompileError, Timeout, ImportError) |
error_message |
string? | First 500 chars of error details |
execution_time_ms |
int | Execution time in milliseconds |
Check syntax without executing code.
Request:
{
"code": "def foo(:\n pass",
"language": "python",
"filename": "main.py"
}Response:
{
"valid": false,
"errors": ["SyntaxError: invalid syntax (line 1)"],
"language": "python",
"check_time_ms": 12
}List supported languages with installed runtime versions.
curl http://localhost:30820/languages{
"languages": {
"python": "Python 3.11.2",
"javascript": "v20.11.0",
"typescript": "5.3.3",
"go": "go1.22.0",
"rust": "rustc 1.77.0",
"c": "gcc 13.2.0",
"cpp": "g++ 13.2.0",
"bash": "GNU bash 5.2.21"
}
}Supported languages: python (aliases: py, python3), javascript (js, node), typescript (ts), go (golang), rust (rs), c, cpp (c++), bash (sh, shell)
curl http://localhost:30820/health
# {"status": "healthy"}Standard llama.cpp server API. See llama.cpp documentation.
OpenAI-compatible chat completions with response_format support for grammar-constrained JSON output.
Example:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3.5-9B-Q6_K",
"messages": [{"role":"user","content":"/nothink\nSay hello"}],
"max_tokens": 50,
"response_format": {"type": "json_object"}
}'Raw completion endpoint (no chat template). Used internally by the benchmark runner.
Generate embeddings for input text. Used by Geometric Lens for C(x)/G(x) scoring.
curl http://localhost:8080/health
# {"status":"ok"}llama-server exposes many more endpoints. See the llama.cpp server docs for the full API reference.