Desktop AI assistant for local automation, voice interaction, and tool execution. SPARKY stands for Smart Personal Assistant for Real-time Knowledge and Productivity.
Created by Ahan Sardar.
- Features
- Upgrade v1.1.0
- Tech Stack
- Requirements
- System Requirements
- Quick Start
- Usage
- Supported Commands
- Models and Assets
- Environment Variables
- Project Structure
- Roadmap
- Troubleshooting
- Contributing
- Safety Notice
- License
- Chat and reasoning via Ollama (
llama3:8b) - Natural-language action execution (apps, browser, files, system tasks)
- Wake word support with OpenWakeWord (
hey_sparky) - Voice input via microphone + SpeechRecognition
- Voice output via Piper TTS
- PDF summarization from local files
- Vision-enabled actions (uses
llava:7bwhen needed)[Still under development] - Animated Tkinter desktop UI
- User memory read/update support
- GitHub release update checker with in-app patch update prompt
Post-last-push upgrades included in this version:
-
Voice and wake word:
- Always-on wake-word listener reliability improvements (auto-restart watchdog)
- Wake listening and manual mic flow coordination improvements
- Wake/listening cue sounds integrated (
wake.wav/sleep.wav)
-
Audio controls:
- Dedicated Control Pane opened from header Settings button (
assets/settings.svg) - Mic mute/unmute feature with proper hard block on microphone access when muted
- AI voice volume control with slider and icons (
volume_mute.svg,volume_low.svg,volume_high.svg) - Live TTS gain control in runtime
- In-app playback pane for active media with thumbnail, song title, artist, and progress
- Playback controls: Previous = -5s seek, Play/Pause toggle, Next = +5s seek
- Player button visuals updated to pane-matching transparent style
- Dedicated Control Pane opened from header Settings button (
-
UI/UX upgrades:
- Header status now shows clear runtime state (Online / Listening / Processing / Responding)
- Processing indicator with rotating loading icon in status area
- Live info tiles for auto-timezone Time, Date, and Weather (°C + icon)
- Weather condition legend added in UI
- Cleaner input area and control grouping for easier usage
- Poppins font support from bundled local
fonts/files with runtime fallback handling
-
PDF summarization flow:
- PDF upload flow (no manual path typing required)
- Pending upload card with discard (
X) above input - Uploaded PDF card in chat area
- Markdown-style response rendering improvements (bold and bullet formatting)
-
Runtime and behavior:
- Better first-run setup feedback for Ollama/model checks with progress states
- In-app GitHub release check on startup with themed Update Now / Remind Me Later popup
- Patch-update flow for installed builds using GitHub release patch ZIP assets
- Remind-later update state persisted locally for installed builds
- Conversational intent-routing fixes to avoid accidental action execution for normal questions
- Date/time query handling improved (including relative date queries like yesterday/today/tomorrow)
- Web/browsing action paths disabled in this build flow
-
Installer/runtime setup:
- Post-install progress reporting for Python/runtime/model setup
- Bundled Poppins fonts installed system-wide during setup
- Installer now expects Ollama to already be installed and focuses on model pull + verification
- Model setup flow now verifies Ollama, pulls
llama3:8b, verifies the model, and ensures local serving - Visible model-pull terminal flow added so long Ollama downloads do not appear frozen during setup
-
System monitoring:
- Header Monitor button with dedicated System Resource Monitor pane
- Live metrics for CPU temperature, GPU usage, CPU usage, RAM usage, storage usage
- Network speed from active speed tests (download/upload) shown in MB/s
- Manual Run Speed Test button in monitor pane
-
AI capability:
- Direct natural command support for system stats and speed tests
run speed testexecutes immediately with results- Resource responses include optimization suggestions when values are poor
-
Media and action stability:
- Action module loading hardened to avoid duplicate native-module loads in the same process
- YouTube/music playback path now defers heavy automation imports until actually needed
- Music commands prefer the direct playback path first for more reliable song requests
-
Stability fixes:
- Wake listener audio input fallback for unsupported channel configs
- Better microphone compatibility for varied device channel/sample-rate settings
- Python 3.11+
- Ollama
- Tkinter + Pillow
- PyAudio + SpeechRecognition
- OpenWakeWord
- FFmpeg (bundled build included in this repo)
- Piper runtime (
piper/) with local voice model (models/)
- Windows (recommended; automation coverage is strongest on Windows)
- Python 3.11 or newer
- Ollama installed
- Microphone and speakers
- FFmpeg available (repo includes
ffmpeg-8.0.1-essentials_build/)
Minimum:
- OS: Windows 10/11 (64-bit)
- CPU: 4-core processor (Intel i5 8th gen / Ryzen 5 equivalent)
- RAM: 8 GB
- Storage: 15 GB free space
- GPU: Not required
- Network: Stable internet for first-time Ollama/model download
Recommended:
- OS: Windows 11 (64-bit)
- CPU: 6+ cores (Intel i7 / Ryzen 7 or better)
- RAM: 16 GB or more
- Storage: 30 GB+ free SSD space
- GPU: Optional (helps with some model/vision workloads)
- Audio: Good quality mic + speakers/headset
- Clone the repo and move into the project directory.
- Create and activate virtual environment.
python -m venv venv
venv\Scripts\activate- Install dependencies.
python -m pip install --upgrade pip
pip install -r requirements.txt
playwright install- Pull required Ollama models.
ollama pull llama3:8b
ollama pull llava:7b- Run SPARKY.
python src/ai_agent.py- Type in the input box and press
Enteror clickSEND. - Click
MICto capture voice input. - Say "Hey Sparky" for wake-word activation.
- Type
infoin chat to view supported command examples.
Note: Slash action format (/action ...) is disabled in the current app flow. Use natural language commands.
Installed build note:
- SPARKY can check GitHub releases on startup and offer a patch update for existing installs when a matching patch asset is published.
Examples you can type directly:
open calculatorplay <song or video name>pause/resume/stop songweather nowweather in <city>summarize pdf <full_file_path>volume up/volume down/mute/set volume to 45brightness up/brightness downremind me to <task> at HH:MM on YYYY-MM-DDremember <fact>show my memoryquit
- Required LLM:
llama3:8b - Vision model (for screen/camera features):
llava:7b - Wake word model:
models/hey_sparky.onnx(and/or.tflite) - TTS model:
models/en_US-lessac-medium.onnx - Piper binary:
piper/piper.exe - FFmpeg binaries:
ffmpeg-8.0.1-essentials_build/bin/
-
SPARKY_STT_CACHE
Optional cache path for model/artifact downloads used by runtime dependencies. -
SPARKY_ACTIONS_DIR
Optional custom actions folder (defaults to projectactions/). -
SPARKY_OWW_MODEL
OpenWakeWord model name fallback (default:hey_sparky). -
SPARKY_WAKEWORD_THRESHOLD
Wake-word detection threshold (default in code:0.30). -
SPARKY_OWW_ANY_THRESHOLD
Backup detector threshold (default in code:0.55). -
SPARKY_WAKE_RMS_THRESHOLD
Minimum input energy for wake pipeline (default in code:80). -
LOCALAPPDATA\SPARKY\update_state.jsonStores the in-app updater remind-later state for release prompts.
SPARKY/
actions/ Action modules
agent/ Planner/executor/task queue components
assets/ UI assets (logo, text image, icons)
config/ Model config
memory/ Memory management
models/ Wake-word and TTS model files
piper/ Piper runtime binaries
src/ Main app, bridge, voice IO, LLM clients
ui.py Desktop UI
requirements.txt
- Bundled one-click installers with smoother dependency setup flow
- UI/UX upgrade for cleaner interaction, better feedback, and accessibility
- Performance optimization for faster startup and lower memory usage
- Better cross-platform automation support (Linux/macOS parity)
- Better onboarding setup checks on first launch
- Improved logging/export for debugging sessions
- Additional wake-word and voice profiles
- Background detection pipeline for always-on wake monitoring
- More productivity actions and advanced automation workflows in future updates
-
Chat not responding:
- Make sure Ollama is running.
- Verify
llama3:8bis pulled.
-
Vision actions fail:
- Pull
llava:7bwithollama pull llava:7b.
- Pull
-
No microphone input:
- Check OS microphone permissions.
- Verify PyAudio installation matches your Python build.
-
TTS issues:
- Confirm
piper/piper.exeexists. - Confirm
models/en_US-lessac-medium.onnxexists.
- Confirm
-
Media/YouTube audio issues:
- Confirm FFmpeg exists at
ffmpeg-8.0.1-essentials_build/bin/ffmpeg.exe. - Optionally add that
binfolder to your systemPATH.
- Confirm FFmpeg exists at
-
If none of the above contact me at ahansardarvis@gmail.com
Contributions are welcome.
- Fork the repository.
- Create a feature branch.
- Make your changes with clear commit messages.
- Run basic checks before opening a PR.
- Open a pull request with a short summary and testing notes.
Some actions can control keyboard/mouse, open apps/websites, and modify files. Use carefully on machines with important data.
This project is licensed under the MIT License. See LICENSE.
This project also depends on third-party tools/models (including FFmpeg and Ollama models).
Review their respective licenses before redistribution or commercial use.
