Skip to content

OpenClaw Gateway as alternative backend (memory, tools, multi-model) #30

@kokoima

Description

@kokoima

Summary

Clicky is amazing as a screen-aware voice companion — the cursor overlay, push-to-talk, and POINT system are genuinely next-level UX. But right now it's a stateless Claude wrapper: no memory between sessions, no tools, no persistent context.

I'd like to propose adding OpenClaw (https://github.com/openclaw/openclaw) Gateway as an optional alternative backend — so Clicky can talk to a full personal AI agent instead of vanilla Claude.

What is OpenClaw?

OpenClaw is an open-source (MIT) self-hosted AI gateway that connects agents to messaging surfaces (WhatsApp, Telegram, Slack, Discord, etc.). It provides:
• Persistent memory — workspace files, conversation history across sessions
• Tool use — browser control, code execution, file system, git, cron jobs
• Multi-model — Claude, GPT, Gemini, and others via a single gateway
• Built-in TTS — ElevenLabs already integrated (reusable for Clicky)
• Skills system — extensible agent capabilities

It exposes a WebSocket Protocol (v3) that native apps already use (macOS menu bar app, iOS/Android nodes).

Proposed Architecture

Clicky (Swift) ←→ WS Protocol v3 ←→ OpenClaw Gateway ←→ Agent (memory + tools + skills)
The key idea: replace the Cloudflare Worker proxy with a direct WebSocket connection to OpenClaw Gateway. The Worker becomes optional (standalone fallback mode).

Changes in Clicky:

  1. New OpenClawClient.swift — WebSocket client implementing Gateway Protocol v3 (connect, challenge/auth, chat.send, event streaming)
  2. Refactor CompanionManager.swift — Provider pattern so users can choose between standalone mode (current Worker proxy, remains default) and OpenClaw mode (Gateway WebSocket)
  3. TTS via Gateway — tts.convert / talk.speak RPC instead of direct ElevenLabs proxy
  4. Settings panel — Gateway URL + auth token configuration
  5. Screenshots as attachments — sent via chat.send (Gateway already handles image attachments)
  6. POINT parsing unchanged — the [POINT:x,y:label] system stays exactly as-is

What stays the same:
All UI/UX, overlay, blue cursor, POINT animations, push-to-talk, AssemblyAI transcription, standalone mode works exactly as today (zero breaking changes), macOS-only scope.

What users get:
• Memory: in-memory → persistent workspace
• Tools: none → browser, exec, git, cron, etc.
• Context: conversation only → full agent context
• Model: Claude only → any model via Gateway
• API keys: 3 keys in Worker → centralized in Gateway
• Cross-device: no → same session from phone/desktop/web

Implementation plan:
Happy to implement this as a PR. ~500 lines for WS client, ~100 lines CompanionManager refactor, ~150 lines Settings UI. Standalone mode remains default.

Questions for maintainers:

  1. Is a provider/backend abstraction pattern welcome?
  2. Any preferences on Settings UI?
  3. Should this be behind a feature flag initially?

Happy to discuss before writing code. I have a detailed architecture doc ready if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions