End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.
-
Updated
Mar 2, 2026
End-to-end documentation to set up your own local & fully private LLM server on Debian. Equipped with chat, web search, RAG, model management, MCP servers, image generation, and TTS.
A robust, production-ready Python toolkit to automate the synchronization between a directory of .gguf model files and a llama-swap config.yaml
Auto-configure opencode to use a local llama-swap instance with model and context detection
Custom Llama Swap Container Image
Launch and optimize llama.cpp servers automatically across Linux, macOS, and Windows using hardware detection and configuration tuning.
Start/stop your Llama Swap models with ulauncher
Autonomous overnight LLM eval pipeline for local GGUF models — multi-turn agentic tasks, dimension-routed dual-judge scoring, SQLite-backed comparison reports. Built for llama.cpp + llama-swap on dual-GPU rigs.
Config-driven local LLM toolkit for llama.cpp and llama-swap, with a FastAPI Web UI, eval/benchmark helpers, and deployment packaging.
Add a description, image, and links to the llama-swap topic page so that developers can more easily learn about it.
To associate your repository with the llama-swap topic, visit your repo's landing page and select "manage topics."