SPARKY

Desktop AI assistant for local automation, voice interaction, and tool execution. SPARKY stands for Smart Personal Assistant for Real-time Knowledge and Productivity.

Created by Ahan Sardar.

Features

Chat and reasoning via Ollama (llama3:8b)
Natural-language action execution (apps, browser, files, system tasks)
Wake word support with OpenWakeWord (hey_sparky)
Voice input via microphone + SpeechRecognition
Voice output via Piper TTS
PDF summarization from local files
Vision-enabled actions (uses llava:7b when needed)[Still under development]
Animated Tkinter desktop UI
User memory read/update support
GitHub release update checker with in-app patch update prompt

Upgrade v1.1.0

Post-last-push upgrades included in this version:

Voice and wake word:
- Always-on wake-word listener reliability improvements (auto-restart watchdog)
- Wake listening and manual mic flow coordination improvements
- Wake/listening cue sounds integrated (wake.wav / sleep.wav)
Audio controls:
- Dedicated Control Pane opened from header Settings button (assets/settings.svg)
- Mic mute/unmute feature with proper hard block on microphone access when muted
- AI voice volume control with slider and icons (volume_mute.svg, volume_low.svg, volume_high.svg)
- Live TTS gain control in runtime
- In-app playback pane for active media with thumbnail, song title, artist, and progress
- Playback controls: Previous = -5s seek, Play/Pause toggle, Next = +5s seek
- Player button visuals updated to pane-matching transparent style
UI/UX upgrades:
- Header status now shows clear runtime state (Online / Listening / Processing / Responding)
- Processing indicator with rotating loading icon in status area
- Live info tiles for auto-timezone Time, Date, and Weather (°C + icon)
- Weather condition legend added in UI
- Cleaner input area and control grouping for easier usage
- Poppins font support from bundled local fonts/ files with runtime fallback handling
PDF summarization flow:
- PDF upload flow (no manual path typing required)
- Pending upload card with discard (X) above input
- Uploaded PDF card in chat area
- Markdown-style response rendering improvements (bold and bullet formatting)
Runtime and behavior:
- Better first-run setup feedback for Ollama/model checks with progress states
- In-app GitHub release check on startup with themed Update Now / Remind Me Later popup
- Patch-update flow for installed builds using GitHub release patch ZIP assets
- Remind-later update state persisted locally for installed builds
- Conversational intent-routing fixes to avoid accidental action execution for normal questions
- Date/time query handling improved (including relative date queries like yesterday/today/tomorrow)
- Web/browsing action paths disabled in this build flow
Installer/runtime setup:
- Post-install progress reporting for Python/runtime/model setup
- Bundled Poppins fonts installed system-wide during setup
- Installer now expects Ollama to already be installed and focuses on model pull + verification
- Model setup flow now verifies Ollama, pulls llama3:8b, verifies the model, and ensures local serving
- Visible model-pull terminal flow added so long Ollama downloads do not appear frozen during setup
System monitoring:
- Header Monitor button with dedicated System Resource Monitor pane
- Live metrics for CPU temperature, GPU usage, CPU usage, RAM usage, storage usage
- Network speed from active speed tests (download/upload) shown in MB/s
- Manual Run Speed Test button in monitor pane
AI capability:
- Direct natural command support for system stats and speed tests
- run speed test executes immediately with results
- Resource responses include optimization suggestions when values are poor
Media and action stability:
- Action module loading hardened to avoid duplicate native-module loads in the same process
- YouTube/music playback path now defers heavy automation imports until actually needed
- Music commands prefer the direct playback path first for more reliable song requests
Stability fixes:
- Wake listener audio input fallback for unsupported channel configs
- Better microphone compatibility for varied device channel/sample-rate settings

Tech Stack

Python 3.11+
Ollama
Tkinter + Pillow
PyAudio + SpeechRecognition
OpenWakeWord
FFmpeg (bundled build included in this repo)
Piper runtime (piper/) with local voice model (models/)

Requirements

Windows (recommended; automation coverage is strongest on Windows)
Python 3.11 or newer
Ollama installed
Microphone and speakers
FFmpeg available (repo includes ffmpeg-8.0.1-essentials_build/)

System Requirements

Minimum:

OS: Windows 10/11 (64-bit)
CPU: 4-core processor (Intel i5 8th gen / Ryzen 5 equivalent)
RAM: 8 GB
Storage: 15 GB free space
GPU: Not required
Network: Stable internet for first-time Ollama/model download

Recommended:

OS: Windows 11 (64-bit)
CPU: 6+ cores (Intel i7 / Ryzen 7 or better)
RAM: 16 GB or more
Storage: 30 GB+ free SSD space
GPU: Optional (helps with some model/vision workloads)
Audio: Good quality mic + speakers/headset

Quick Start

Clone the repo and move into the project directory.
Create and activate virtual environment.

python -m venv venv
venv\Scripts\activate

Install dependencies.

python -m pip install --upgrade pip
pip install -r requirements.txt
playwright install

Pull required Ollama models.

ollama pull llama3:8b
ollama pull llava:7b

Run SPARKY.

python src/ai_agent.py

Usage

Type in the input box and press Enter or click SEND.
Click MIC to capture voice input.
Say "Hey Sparky" for wake-word activation.
Type info in chat to view supported command examples.

Note: Slash action format (/action ...) is disabled in the current app flow. Use natural language commands.

Installed build note:

SPARKY can check GitHub releases on startup and offer a patch update for existing installs when a matching patch asset is published.

Supported Commands

Examples you can type directly:

open calculator
play <song or video name>
pause / resume / stop song
weather now
weather in <city>
summarize pdf <full_file_path>
volume up / volume down / mute / set volume to 45
brightness up / brightness down
remind me to <task> at HH:MM on YYYY-MM-DD
remember <fact>
show my memory
quit

Models and Assets

Required LLM: llama3:8b
Vision model (for screen/camera features): llava:7b
Wake word model: models/hey_sparky.onnx (and/or .tflite)
TTS model: models/en_US-lessac-medium.onnx
Piper binary: piper/piper.exe
FFmpeg binaries: ffmpeg-8.0.1-essentials_build/bin/

Environment Variables

SPARKY_STT_CACHE
Optional cache path for model/artifact downloads used by runtime dependencies.
SPARKY_ACTIONS_DIR
Optional custom actions folder (defaults to project actions/).
SPARKY_OWW_MODEL
OpenWakeWord model name fallback (default: hey_sparky).
SPARKY_WAKEWORD_THRESHOLD
Wake-word detection threshold (default in code: 0.30).
SPARKY_OWW_ANY_THRESHOLD
Backup detector threshold (default in code: 0.55).
SPARKY_WAKE_RMS_THRESHOLD
Minimum input energy for wake pipeline (default in code: 80).
LOCALAPPDATA\SPARKY\update_state.json Stores the in-app updater remind-later state for release prompts.

Project Structure

SPARKY/
  actions/         Action modules
  agent/           Planner/executor/task queue components
  assets/          UI assets (logo, text image, icons)
  config/          Model config
  memory/          Memory management
  models/          Wake-word and TTS model files
  piper/           Piper runtime binaries
  src/             Main app, bridge, voice IO, LLM clients
  ui.py            Desktop UI
  requirements.txt

Roadmap

Bundled one-click installers with smoother dependency setup flow
UI/UX upgrade for cleaner interaction, better feedback, and accessibility
Performance optimization for faster startup and lower memory usage
Better cross-platform automation support (Linux/macOS parity)
Better onboarding setup checks on first launch
Improved logging/export for debugging sessions
Additional wake-word and voice profiles
Background detection pipeline for always-on wake monitoring
More productivity actions and advanced automation workflows in future updates

Troubleshooting

Chat not responding:
- Make sure Ollama is running.
- Verify llama3:8b is pulled.
Vision actions fail:
- Pull llava:7b with ollama pull llava:7b.
No microphone input:
- Check OS microphone permissions.
- Verify PyAudio installation matches your Python build.
TTS issues:
- Confirm piper/piper.exe exists.
- Confirm models/en_US-lessac-medium.onnx exists.
Media/YouTube audio issues:
- Confirm FFmpeg exists at ffmpeg-8.0.1-essentials_build/bin/ffmpeg.exe.
- Optionally add that bin folder to your system PATH.
If none of the above contact me at ahansardarvis@gmail.com

Contributing

Contributions are welcome.

Fork the repository.
Create a feature branch.
Make your changes with clear commit messages.
Run basic checks before opening a PR.
Open a pull request with a short summary and testing notes.

Safety Notice

Some actions can control keyboard/mouse, open apps/websites, and modify files. Use carefully on machines with important data.

License

This project is licensed under the MIT License. See LICENSE.

This project also depends on third-party tools/models (including FFmpeg and Ollama models).
Review their respective licenses before redistribution or commercial use.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SPARKY

Table of Contents

Features

Upgrade v1.1.0

Tech Stack

Requirements

System Requirements

Quick Start

Usage

Supported Commands

Models and Assets

Environment Variables

Project Structure

Roadmap

Troubleshooting

Contributing

Safety Notice

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
actions		actions
agent		agent
assets		assets
config		config
ffmpeg-8.0.1-essentials_build		ffmpeg-8.0.1-essentials_build
fonts		fonts
memory		memory
models		models
piper		piper
src		src
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt
system_stats.py		system_stats.py
ui.py		ui.py

Folders and files

Latest commit

History

Repository files navigation

SPARKY

Table of Contents

Features

Upgrade v1.1.0

Tech Stack

Requirements

System Requirements

Quick Start

Usage

Supported Commands

Models and Assets

Environment Variables

Project Structure

Roadmap

Troubleshooting

Contributing

Safety Notice

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages