Skip to content

offloadmywork/voice-control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OpenClaw Voice Control

Home voice assistant with an always-on Android overlay avatar. The system consists of:

  • Android app — floating overlay with animated Lottie avatar, push-to-talk, WebSocket audio streaming
  • Python server (server.py) — wake word detection (OpenWakeWord) + STT (Whisper) on Mac mini
  • Avatar pipeline (tools/avatar_pipeline) — extracts, isolates, and generates state animations from a Lottie character source

Architecture

Android Device (overlay)              Mac mini (server.py)
┌─────────────────────────┐          ┌──────────────────────┐
│ Floating avatar overlay │ ws://    │ OpenWakeWord          │
│ Push-to-talk (PTT)      │◄───────►│ Whisper STT           │
│ State animations        │ audio/  │ OpenClaw integration  │
│ OTA update button       │ status  │ TTS response          │
└─────────────────────────┘          └──────────────────────┘

Quick Start

Server

source venv/bin/activate
python server.py

Android App

cd android-app
./gradlew assembleDebug
cp app/build/outputs/apk/debug/app-debug.apk ../OpenClawVoice.apk

Install OpenClawVoice.apk on the device. The overlay loads animations from a dev HTTP server for rapid iteration.

Dev Feedback Loop

# Terminal 1: Serve staged assets + APK
python3 -m http.server 8011 &   # assets on :8011
python3 -m http.server 9090 &   # APK on :9090

# Terminal 2: Generate/iterate on avatar
python3 tools/avatar_pipeline generate --stage-only --no-build

On the device overlay:

  • ↻ button — reload animation from server (cache-busted)
  • ⬆ button — download and install latest APK from server
  • Open button — open the main app
  • Version label — shows current app version

Avatar System

Source Character

The character is a Turkish-designed Lottie animation of a cartoon person sitting in a chair. The source file is android-app/app/src/main/assets/avatar_source_base.json (452×560, 24fps, 272 frames).

Layer Anatomy

Layer names are in Turkish. The anatomy map (tools/avatar_anatomy_map.json) defines which layers belong to which body parts and which are environment (dropped from character extraction).

ind Turkish Name English Part Group
25 koltuk chair environment (dropped)
22 kolçak armrest environment (dropped)
12 fiskos side table environment (dropped)
4 saat clock environment (dropped)
23 boddy body/torso torso (ROOT)
18 yüz Outlines face face
3 ağız mouth (closed) mouth
1 katman 20/zubulig...2 open mouth (lips+teeth+interior) mouth
2 katman 20/zubulig... open mouth (teeth+interior) mouth
14 göz1 Outlines right eye outline eye (td=1, matte source)
16 göz2 Outlines left eye outline eye (td=1, matte source)
5 göz kapağı 2 left eyelid eyelid (tp=16, tt=1)
6 göz kapağı 1 right eyelid eyelid (tp=14, tt=1)
7 pupil1 right pupil pupil (parent=14)
8 pupil2 left pupil pupil (parent=16)
13 göz1 Outlines 2 right eye white eye (parent=14)
15 göz2 Outlines 2 left eye white eye (parent=16)
9 kaş1 right eyebrow eyebrow
10 kaş2 left eyebrow eyebrow
17 allıklar Outlines cheek blush cheek
11 ışık Outlines highlight highlight (hidden)
19 saç Outlines hair hair
20 alt kol sol lower left arm arm
21 üst kol sol upper left arm arm
24 sağ kol right arm arm

Critical Lottie Details

Track mattes (eyelids): Eyelid layers (5, 6) are root-level with no parent, scaled to 791% x and rotated -180°. They use tp (track parent) to reference the eye outline layers (14, 16) as alpha matte sources. Without proper matte clipping, eyelids render as huge blobs covering the face.

  • tp = explicit matte source reference by layer index (takes precedence over positional adjacency)
  • td=1 = this layer is a track matte source
  • tt=1 = this layer uses an alpha matte

Mouth layers: Layers 1 and 2 are parented to layer 3 (mouth). They contain the open-mouth animation (lips, teeth, mouth interior). Originally mislabeled as "environment" — they were incorrectly dropped. Their names contain "koltukta" (Turkish for "in the chair") which matched the chair keyword filter.

Shape-level animation: Some layers (e.g., cheek layer 17) have animation keyframes embedded inside shape group transforms, not just at the layer level. The pipeline must freeze both layer-level and shape-level animation for state loops.

State Animation System

adapt_base_character() in the pipeline:

  1. Strips environment layers (chair, table, armrest, clock)
  2. Keeps original 452×560 coordinate space (Android renderer handles display scaling)
  3. Freezes ALL animation to first-frame values (position, scale, rotation, opacity, shape-level transforms)
  4. Hides eyelid layers (opacity 0) — they drift from eye shapes during breathing since they have no parent
  5. Hides layers that originally started after frame 0 (ip > 0) — they're not part of the resting pose
  6. Applies subtle state-specific animation to the body root (breathing pulse, bobbing)
  7. Can apply part-specific animation (eye movement, arm rotation, etc.) per state profile

States: idle, listening, processing, speaking, error (one-shot), flair (one-shot)

Pipeline Commands

# Generate staged assets (no APK build)
python3 tools/avatar_pipeline generate --stage-only --no-build

# Validate staged assets
python3 tools/avatar_pipeline validate --stage

# Promote staged assets to app assets
python3 tools/avatar_pipeline promote

# Full generate + build
python3 tools/avatar_pipeline generate

Pipeline Outputs

  • tools/avatar_pipeline_out/staging_assets/ — staged Lottie JSONs for all states + part/layer isolates
  • tools/avatar_pipeline_out/staging_assets/avatar_parts_manifest.json — maps part names to files
  • tools/avatar_pipeline_out/staging_assets/avatar_layers_manifest.json — maps layer IDs to files
  • tools/avatar_pipeline_out/avatar_motion_report.json — animation amplitude analysis

Android App Structure

Overlay

  • AvatarOverlayView.kt — FrameLayout with LottieAnimationView, drag ring, Open/Reload/Update buttons, version label
  • AvatarController.kt — manages state transitions, loads animations from dev HTTP server with cache busting, handles PTT touch, drift animation, OTA updates
  • AvatarState.kt — enum: IDLE, LISTENING, PROCESSING, SPEAKING, ERROR, FLAIR

Service

  • VoiceForegroundService.kt — foreground service that creates the overlay, manages WebSocket connection, handles PTT and state transitions

Main Activity

  • MainActivity.kt — settings UI, Avatar Lab (debug viewer for state/part/layer animations with mode spinner), update button, connection controls

Avatar Lab (Debug Viewer)

In the main app, the Avatar Lab lets you preview any animation from the staging server:

  • Mode: state — preview idle/listening/processing/speaking/error/flair animations
  • Mode: part — preview isolated body parts (face, mouth, eye, eyelid, etc.)
  • Mode: layer — preview individual layers by ID
  • Base URL input for the staging server
  • Reload button with cache busting

Shared Utilities

  • AppUpdater.kt — shared OTA update logic (download APK from server + install intent), used by both MainActivity and AvatarController

Configuration

  • tools/avatar_pipeline.yaml — style spec (dimensions, fps, durations, palette, quality gates)
  • tools/avatar_anatomy_map.json — layer-to-part mapping, drop list, labels
  • Dev server IPs hardcoded to 10.0.0.4 (update in AvatarController.kt, MainActivity.kt, AppUpdater.kt if network changes)

Dependencies

Server (Python)

  • openwakeword, openai-whisper, websockets, numpy, pyaudio

Android

  • Lottie (com.airbnb.android:lottie)
  • Java-WebSocket
  • Kotlin Coroutines
  • AndroidX AppCompat, Core

About

Voice control system for OpenClaw with wake word detection, Hebrew STT, and avatar overlay

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors