Home voice assistant with an always-on Android overlay avatar. The system consists of:
- Android app — floating overlay with animated Lottie avatar, push-to-talk, WebSocket audio streaming
- Python server (
server.py) — wake word detection (OpenWakeWord) + STT (Whisper) on Mac mini - Avatar pipeline (
tools/avatar_pipeline) — extracts, isolates, and generates state animations from a Lottie character source
Android Device (overlay) Mac mini (server.py)
┌─────────────────────────┐ ┌──────────────────────┐
│ Floating avatar overlay │ ws:// │ OpenWakeWord │
│ Push-to-talk (PTT) │◄───────►│ Whisper STT │
│ State animations │ audio/ │ OpenClaw integration │
│ OTA update button │ status │ TTS response │
└─────────────────────────┘ └──────────────────────┘
source venv/bin/activate
python server.pycd android-app
./gradlew assembleDebug
cp app/build/outputs/apk/debug/app-debug.apk ../OpenClawVoice.apkInstall OpenClawVoice.apk on the device. The overlay loads animations from a dev HTTP server for rapid iteration.
# Terminal 1: Serve staged assets + APK
python3 -m http.server 8011 & # assets on :8011
python3 -m http.server 9090 & # APK on :9090
# Terminal 2: Generate/iterate on avatar
python3 tools/avatar_pipeline generate --stage-only --no-buildOn the device overlay:
- ↻ button — reload animation from server (cache-busted)
- ⬆ button — download and install latest APK from server
- Open button — open the main app
- Version label — shows current app version
The character is a Turkish-designed Lottie animation of a cartoon person sitting in a chair. The source file is android-app/app/src/main/assets/avatar_source_base.json (452×560, 24fps, 272 frames).
Layer names are in Turkish. The anatomy map (tools/avatar_anatomy_map.json) defines which layers belong to which body parts and which are environment (dropped from character extraction).
| ind | Turkish Name | English | Part Group |
|---|---|---|---|
| 25 | koltuk | chair | environment (dropped) |
| 22 | kolçak | armrest | environment (dropped) |
| 12 | fiskos | side table | environment (dropped) |
| 4 | saat | clock | environment (dropped) |
| 23 | boddy | body/torso | torso (ROOT) |
| 18 | yüz Outlines | face | face |
| 3 | ağız | mouth (closed) | mouth |
| 1 | katman 20/zubulig...2 | open mouth (lips+teeth+interior) | mouth |
| 2 | katman 20/zubulig... | open mouth (teeth+interior) | mouth |
| 14 | göz1 Outlines | right eye outline | eye (td=1, matte source) |
| 16 | göz2 Outlines | left eye outline | eye (td=1, matte source) |
| 5 | göz kapağı 2 | left eyelid | eyelid (tp=16, tt=1) |
| 6 | göz kapağı 1 | right eyelid | eyelid (tp=14, tt=1) |
| 7 | pupil1 | right pupil | pupil (parent=14) |
| 8 | pupil2 | left pupil | pupil (parent=16) |
| 13 | göz1 Outlines 2 | right eye white | eye (parent=14) |
| 15 | göz2 Outlines 2 | left eye white | eye (parent=16) |
| 9 | kaş1 | right eyebrow | eyebrow |
| 10 | kaş2 | left eyebrow | eyebrow |
| 17 | allıklar Outlines | cheek blush | cheek |
| 11 | ışık Outlines | highlight | highlight (hidden) |
| 19 | saç Outlines | hair | hair |
| 20 | alt kol sol | lower left arm | arm |
| 21 | üst kol sol | upper left arm | arm |
| 24 | sağ kol | right arm | arm |
Track mattes (eyelids): Eyelid layers (5, 6) are root-level with no parent, scaled to 791% x and rotated -180°. They use tp (track parent) to reference the eye outline layers (14, 16) as alpha matte sources. Without proper matte clipping, eyelids render as huge blobs covering the face.
tp= explicit matte source reference by layer index (takes precedence over positional adjacency)td=1= this layer is a track matte sourcett=1= this layer uses an alpha matte
Mouth layers: Layers 1 and 2 are parented to layer 3 (mouth). They contain the open-mouth animation (lips, teeth, mouth interior). Originally mislabeled as "environment" — they were incorrectly dropped. Their names contain "koltukta" (Turkish for "in the chair") which matched the chair keyword filter.
Shape-level animation: Some layers (e.g., cheek layer 17) have animation keyframes embedded inside shape group transforms, not just at the layer level. The pipeline must freeze both layer-level and shape-level animation for state loops.
adapt_base_character() in the pipeline:
- Strips environment layers (chair, table, armrest, clock)
- Keeps original 452×560 coordinate space (Android renderer handles display scaling)
- Freezes ALL animation to first-frame values (position, scale, rotation, opacity, shape-level transforms)
- Hides eyelid layers (opacity 0) — they drift from eye shapes during breathing since they have no parent
- Hides layers that originally started after frame 0 (ip > 0) — they're not part of the resting pose
- Applies subtle state-specific animation to the body root (breathing pulse, bobbing)
- Can apply part-specific animation (eye movement, arm rotation, etc.) per state profile
States: idle, listening, processing, speaking, error (one-shot), flair (one-shot)
# Generate staged assets (no APK build)
python3 tools/avatar_pipeline generate --stage-only --no-build
# Validate staged assets
python3 tools/avatar_pipeline validate --stage
# Promote staged assets to app assets
python3 tools/avatar_pipeline promote
# Full generate + build
python3 tools/avatar_pipeline generatetools/avatar_pipeline_out/staging_assets/— staged Lottie JSONs for all states + part/layer isolatestools/avatar_pipeline_out/staging_assets/avatar_parts_manifest.json— maps part names to filestools/avatar_pipeline_out/staging_assets/avatar_layers_manifest.json— maps layer IDs to filestools/avatar_pipeline_out/avatar_motion_report.json— animation amplitude analysis
AvatarOverlayView.kt— FrameLayout with LottieAnimationView, drag ring, Open/Reload/Update buttons, version labelAvatarController.kt— manages state transitions, loads animations from dev HTTP server with cache busting, handles PTT touch, drift animation, OTA updatesAvatarState.kt— enum: IDLE, LISTENING, PROCESSING, SPEAKING, ERROR, FLAIR
VoiceForegroundService.kt— foreground service that creates the overlay, manages WebSocket connection, handles PTT and state transitions
MainActivity.kt— settings UI, Avatar Lab (debug viewer for state/part/layer animations with mode spinner), update button, connection controls
In the main app, the Avatar Lab lets you preview any animation from the staging server:
- Mode: state — preview idle/listening/processing/speaking/error/flair animations
- Mode: part — preview isolated body parts (face, mouth, eye, eyelid, etc.)
- Mode: layer — preview individual layers by ID
- Base URL input for the staging server
- Reload button with cache busting
AppUpdater.kt— shared OTA update logic (download APK from server + install intent), used by both MainActivity and AvatarController
tools/avatar_pipeline.yaml— style spec (dimensions, fps, durations, palette, quality gates)tools/avatar_anatomy_map.json— layer-to-part mapping, drop list, labels- Dev server IPs hardcoded to
10.0.0.4(update inAvatarController.kt,MainActivity.kt,AppUpdater.ktif network changes)
- openwakeword, openai-whisper, websockets, numpy, pyaudio
- Lottie (
com.airbnb.android:lottie) - Java-WebSocket
- Kotlin Coroutines
- AndroidX AppCompat, Core