Summary
Clicky is amazing as a screen-aware voice companion — the cursor overlay, push-to-talk, and POINT system are genuinely next-level UX. But right now it's a stateless Claude wrapper: no memory between sessions, no tools, no persistent context.
I'd like to propose adding OpenClaw (https://github.com/openclaw/openclaw) Gateway as an optional alternative backend — so Clicky can talk to a full personal AI agent instead of vanilla Claude.
What is OpenClaw?
OpenClaw is an open-source (MIT) self-hosted AI gateway that connects agents to messaging surfaces (WhatsApp, Telegram, Slack, Discord, etc.). It provides:
• Persistent memory — workspace files, conversation history across sessions
• Tool use — browser control, code execution, file system, git, cron jobs
• Multi-model — Claude, GPT, Gemini, and others via a single gateway
• Built-in TTS — ElevenLabs already integrated (reusable for Clicky)
• Skills system — extensible agent capabilities
It exposes a WebSocket Protocol (v3) that native apps already use (macOS menu bar app, iOS/Android nodes).
Proposed Architecture
Clicky (Swift) ←→ WS Protocol v3 ←→ OpenClaw Gateway ←→ Agent (memory + tools + skills)
The key idea: replace the Cloudflare Worker proxy with a direct WebSocket connection to OpenClaw Gateway. The Worker becomes optional (standalone fallback mode).
Changes in Clicky:
- New OpenClawClient.swift — WebSocket client implementing Gateway Protocol v3 (connect, challenge/auth, chat.send, event streaming)
- Refactor CompanionManager.swift — Provider pattern so users can choose between standalone mode (current Worker proxy, remains default) and OpenClaw mode (Gateway WebSocket)
- TTS via Gateway — tts.convert / talk.speak RPC instead of direct ElevenLabs proxy
- Settings panel — Gateway URL + auth token configuration
- Screenshots as attachments — sent via chat.send (Gateway already handles image attachments)
- POINT parsing unchanged — the [POINT:x,y:label] system stays exactly as-is
What stays the same:
All UI/UX, overlay, blue cursor, POINT animations, push-to-talk, AssemblyAI transcription, standalone mode works exactly as today (zero breaking changes), macOS-only scope.
What users get:
• Memory: in-memory → persistent workspace
• Tools: none → browser, exec, git, cron, etc.
• Context: conversation only → full agent context
• Model: Claude only → any model via Gateway
• API keys: 3 keys in Worker → centralized in Gateway
• Cross-device: no → same session from phone/desktop/web
Implementation plan:
Happy to implement this as a PR. ~500 lines for WS client, ~100 lines CompanionManager refactor, ~150 lines Settings UI. Standalone mode remains default.
Questions for maintainers:
- Is a provider/backend abstraction pattern welcome?
- Any preferences on Settings UI?
- Should this be behind a feature flag initially?
Happy to discuss before writing code. I have a detailed architecture doc ready if useful.
Summary
Clicky is amazing as a screen-aware voice companion — the cursor overlay, push-to-talk, and POINT system are genuinely next-level UX. But right now it's a stateless Claude wrapper: no memory between sessions, no tools, no persistent context.
I'd like to propose adding OpenClaw (https://github.com/openclaw/openclaw) Gateway as an optional alternative backend — so Clicky can talk to a full personal AI agent instead of vanilla Claude.
What is OpenClaw?
OpenClaw is an open-source (MIT) self-hosted AI gateway that connects agents to messaging surfaces (WhatsApp, Telegram, Slack, Discord, etc.). It provides:
• Persistent memory — workspace files, conversation history across sessions
• Tool use — browser control, code execution, file system, git, cron jobs
• Multi-model — Claude, GPT, Gemini, and others via a single gateway
• Built-in TTS — ElevenLabs already integrated (reusable for Clicky)
• Skills system — extensible agent capabilities
It exposes a WebSocket Protocol (v3) that native apps already use (macOS menu bar app, iOS/Android nodes).
Proposed Architecture
Clicky (Swift) ←→ WS Protocol v3 ←→ OpenClaw Gateway ←→ Agent (memory + tools + skills)
The key idea: replace the Cloudflare Worker proxy with a direct WebSocket connection to OpenClaw Gateway. The Worker becomes optional (standalone fallback mode).
Changes in Clicky:
What stays the same:
All UI/UX, overlay, blue cursor, POINT animations, push-to-talk, AssemblyAI transcription, standalone mode works exactly as today (zero breaking changes), macOS-only scope.
What users get:
• Memory: in-memory → persistent workspace
• Tools: none → browser, exec, git, cron, etc.
• Context: conversation only → full agent context
• Model: Claude only → any model via Gateway
• API keys: 3 keys in Worker → centralized in Gateway
• Cross-device: no → same session from phone/desktop/web
Implementation plan:
Happy to implement this as a PR. ~500 lines for WS client, ~100 lines CompanionManager refactor, ~150 lines Settings UI. Standalone mode remains default.
Questions for maintainers:
Happy to discuss before writing code. I have a detailed architecture doc ready if useful.