Cloud AI monitors your data and requires recurring payments. Local AI typically requires manual dependency management and CLI expertise.
ELL solves both. It is a cross-platform desktop application; it automates the installation of an optimized inference engine and the Gemma-4 model.
ELL detects your OS; it installs llama-server and fetches the Gemma-4 model automatically. You do not need to manage Python environments, manual downloads or complex graphical interfaces.
The app is pre-configured for high-speed inference. It uses Flash Attention and Full GPU Offloading (Vulkan/Metal/CUDA) by default. Use your local hardware to its maximum potential.
Conversations remain on your machine. There is no telemetry and no training data collection; the app works completely offline.
Switch between standard chat and reasoned response modes. ELL uses systemic prompts to force the model to reason before answering.
A desktop application for Windows, macOS, and Linux that runs Large Language Models locally. It manages the server lifecycle, handles model storage, and provides a React-based chat interface.
Yes. It is open-source; you pay $0 per token and $0 for subscriptions.
8GB RAM and a dedicated GPU are recommended. The app runs on integrated graphics using llama.cpp optimizations.
ELL uses the Gemma-4-E4B-it model in GGUF format; it is optimized specifically for local reasoning and efficiency.
# Clone and enter the repository
git clone https://github.com/SimpleSoftwareLTDA/easy-local-llm.git
cd easy-local-llm
# Install dependencies and launch
bun install
bun devWe build local AI for everyone.
- Star the repo to support the project.
- Open an issue for technical feedback.
- Join the Discord (coming soon).
Distributed under the MIT License. See LICENSE for details.
Owned by you. Run by you. Private to you.