Your personal AI assistant. Fully local. Fully yours.
No cloud APIs. No subscriptions. No telemetry. Just your machine, your data, your models.
Status: Under active development. Open-source release coming soon.
Chat with your own knowledge base. Calcifer collects content from Reddit, YouTube, web pages, and your own notes, indexes everything with vector embeddings, and answers questions grounded in real sources. It tells you where its answers come from.
Talk to it. Three voice modes, each suited to different tasks:
- Assistant -- You speak, Calcifer listens (Whisper STT), thinks (your Ollama model), and speaks back (Kokoro TTS). Full agent pipeline -- it can search your knowledge base, look things up on the web, check the weather, save notes, anything the text chat can do. This is the main mode.
- Dictate -- Voice input, text output. Same agent capabilities as Assistant but responds in text instead of speech. Useful when you want to talk but don't need audio back, or if you haven't set up TTS.
- Conversation -- Experimental. Full-duplex speech-to-speech using Moshi, a duplex spoken dialogue model. Sub-200ms latency, feels like a real back-and-forth conversation. Good for casual chitchat. It runs its own small model, so it is not as smart as your main LLM and cannot call tools or access your knowledge base. Think of it as a fun tech demo, not a productivity feature.
Pick from several TTS voices or clone your own with a reference audio file.
Generate images, video, and music. Connected to ComfyUI for local generation. Ask it to create a portrait, a Studio Ghibli-style video, or a melody, and it runs the workflow on your GPU.
Recording studio. Two modes for turning text or voice into any voice you want:
- Voice mode (real-time style transfer) -- Record yourself speaking, then convert it to a target voice using Seed-VC. This is zero-shot voice conversion -- it takes your raw speech and transforms it into the target voice while preserving your original timing, pauses, intonation, and emotion. You sound like someone else, but the performance is still yours. No training, no fine-tuning, just a short reference clip of the target voice.
- Text mode (TTS with voice cloning) -- Type text, pick a voice, get speech. Uses Qwen3-TTS, a new text-to-speech model with both built-in speaker voices and zero-shot voice cloning from a reference audio file. Natural-sounding output with good prosody and multilingual support.
Maps and navigation. Interactive map with markers, geocoding, directions, and nearby POI search -- all through OpenStreetMap. The agent can save locations and give directions during conversation.
Remember things about you. Long-term memory that persists across conversations. It learns your preferences, facts about you, and things you ask it to remember.
Search everything. Web search (via SearxNG), Reddit search, YouTube search and transcript extraction, URL crawling. All results can be ingested into the knowledge base.
Knowledge base. Browse, search, and manage all collected content. Filter by source (Reddit, web, YouTube, manual), subreddit, date, score.
Personal notes. Write and organize notes with topic tags. Notes are searchable and available to the AI during conversations.
Admin dashboard. Monitor collection runs, manage subreddit watchers, configure MCP servers, view system stats. Live GPU monitoring in the sidebar.
You ---> Next.js frontend ---> FastAPI backend ---> Ollama (LLM)
|
+---> ChromaDB (vector search)
+---> SQLite (metadata, conversations, notes)
+---> SearxNG (private web search)
+---> ComfyUI (image/video/audio generation)
+---> Whisper / Kokoro / Qwen3-TTS / Seed-VC (voice)
+---> MCP servers (extensible tools)
Everything runs in Docker containers on your local machine, except Ollama which runs directly on the host for best GPU performance.
| Layer | Tech |
|---|---|
| Frontend | Next.js 15, App Router, Tailwind CSS, shadcn/ui |
| Backend | Python 3.12, FastAPI, SQLModel |
| Vector DB | ChromaDB with sentence-transformers embeddings |
| Metadata DB | SQLite |
| LLM | Ollama (any model -- qwen3, llama, mistral, etc.) |
| Speech-to-Text | faster-whisper (local) |
| Text-to-Speech | Kokoro (chat voices), Qwen3-TTS (studio cloning + built-in speakers) |
| Voice Conversion | Seed-VC (zero-shot style transfer) |
| Web Search | SearxNG (self-hosted, no tracking) |
| Media Generation | ComfyUI (Image, video, audio workflows) |
| Container | Docker with NVIDIA GPU passthrough |
| MCP | Extensible tool system via Model Context Protocol |
The AI agent has access to these tools during conversation:
knowledge_search-- search the local vector databaseweb_search-- private web search via SearxNGcrawl_url-- fetch and parse any URLreddit_search-- search Reddit directlyyoutube_search/youtube_transcript-- find videos and pull transcriptssave_note/get_notes-- read and write personal notesingest_to_knowledge-- add content to the knowledge basefind_location/get_directions/find_nearby-- maps and navigation via OSMsave_map_marker/get_map_markers-- persistent map pinsget_weather-- current weather for any locationsave_memory-- remember facts about the userrun_comfyui_workflow-- generate images, video, or music locally- Any tool from connected MCP servers
This is what Calcifer is developed and tested on:
| Component | Spec |
|---|---|
| GPU | NVIDIA GeForce RTX 4070 Ti SUPER (16 GB VRAM) |
| CPU | Intel Core i7-14700 |
| RAM | 32 GB |
The GPU handles LLM inference (via Ollama), embedding generation, speech-to-text, text-to-speech, and ComfyUI generation. 16 GB VRAM is comfortable for running a 7-8B parameter model alongside embeddings and voice. Larger models or simultaneous ComfyUI generation may need more.
You could run this on less -- a 3060 12GB would handle smaller models and most features. You could also run it on more -- the stack will happily use whatever you give it.
Everything runs locally. Your conversations, your knowledge base, your voice recordings -- none of it leaves your machine. There are no accounts, no analytics, no tracking.
The only network calls go to these services, and only when you use the features that need them:
| Service | Used for | Account/API key | Data sent |
|---|---|---|---|
| Ollama (localhost) | LLM inference | No | Nothing leaves your machine |
| SearxNG (localhost) | Web search | No | Search queries proxied through Google/DuckDuckGo/Bing/Brave (your IP, not an API key) |
| Open-Meteo | Weather forecasts | No | City name or coordinates |
| Nominatim | Geocoding, address lookup | No | Address or coordinates |
| Overpass API | Nearby POI search | No | Coordinates and search radius |
| OpenStreetMap | Map tiles | No | Tile coordinates (standard map loading) |
Zero API keys. Zero accounts. Open-Meteo, Nominatim, and Overpass are free public APIs run by non-profits and open-source projects. SearxNG is a self-hosted metasearch engine that distributes your queries across multiple search engines so no single provider builds a profile on you.
If you disconnect from the internet, everything except web search, weather, and map tiles continues to work -- chat, voice, knowledge base, notes, memories, media generation all run offline.
Actually local. Not "local but phones home for embeddings" or "local but needs an API key for search." Every component runs on your machine. SearxNG replaces Google. Ollama replaces OpenAI. Whisper and Kokoro replace cloud speech APIs. ComfyUI replaces Midjourney/DALL-E.
Knowledge-grounded. Answers come from your collected sources with citations. It shows where information came from -- which Reddit post, which article, which YouTube video. When it uses web search as a fallback, it tells you.
Extensible via MCP. Add new tools without touching the core codebase. Drop a config entry for any MCP-compatible server and the agent picks up its tools automatically.
Not a wrapper. This is a full application with its own storage, its own ingestion pipeline, its own agent loop. It is not a thin UI over an API.
Coming soon. The codebase is being cleaned up for public release. When it ships, you will get:
- Full source code (Python backend + Next.js frontend)
- Docker Compose setup for one-command deployment
- Configuration guides for Ollama, ComfyUI, and SearxNG
- Documentation for adding custom MCP tools
Watch this repo for the release.
If you find this project useful and want to support its development:
TBD -- will be announced with the open-source release.












