GitHub - JasonMun7/Echo · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 122 Commits
EchoPrismAgent		EchoPrismAgent
OmniParser		OmniParser
apps		apps
backend		backend
firebase		firebase
packages/echo-types		packages/echo-types
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
agent-diagram.png		agent-diagram.png
architecture-diagram.png		architecture-diagram.png
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml

Repository files navigation

Echo

An AI-powered workflow automation platform — create, record, and run desktop & browser workflows using voice, chat, or visual recording, powered by the EchoPrism vision-language agent.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
Getting Started
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

Echo is an AI-powered workflow automation platform. Create and edit desktop and browser workflows (from recordings, voice, or chat), then run them via the EchoPrism vision-language agent — which executes steps (navigate, click, type, scroll) on your desktop. Use the web dashboard to manage workflows and runs, and the Electron desktop app for voice-driven control and running your workflows locally.

Architecture

Agent Diagram

Built With

Getting Started

To run the full stack locally or deploy from scratch, follow the phases below.

Prerequisites

Install the following tools before proceeding:

Node.js 18+ — nodejs.org or nvm install 18
pnpm — npm install -g pnpm
Python 3.11+ — python.org or pyenv install 3.11
Docker — for building and deploying images

gcloud CLI — Install guide

gcloud auth login
gcloud auth application-default login

Firebase CLI

npm install -g firebase-tools
firebase login

Doppler (optional but recommended) — for secrets management
```
brew install dopplerhq/cli/doppler
doppler login
```

Phase 1: GCP Setup

Go to Google Cloud Console and create or select a project with billing enabled.
In APIs & Services → Enable APIs, enable:
- Cloud Run API
- Cloud Scheduler API
- Firestore API
- Cloud Storage API
- Gemini API
Go to Cloud Storage → Buckets, create a bucket with Uniform bucket-level access, and note the name (e.g. echo-assets-prod).

Phase 2: Firebase Setup

Go to Firebase Console and create a new project or link your existing GCP project.
Enable authentication: Authentication → Sign-in method → enable Email/Password and Google.
Create Firestore: Firestore Database → Create database → choose Native mode.
Register your web app: Project Settings → Your apps → Add web app (</>) and copy the config object.

Deploy Firestore rules from the project root:

cd firebase && firebase deploy --only firestore:rules

Phase 3: Service Accounts & IAM

Use the default compute service account for Cloud Run and ensure it has:

Firestore: Cloud Datastore User (or Firestore roles)
Storage: Storage Object Admin
Cloud Run Jobs: Run Jobs Executor

Phase 4: Gemini API Key

Go to Google AI Studio
Sign in, select your GCP project, and create an API key
Copy the key — you'll need it for GEMINI_API_KEY

Phase 5: Local Development

Clone and install:

git clone https://github.com/JasonMun7/echo.git
cd echo
pnpm install
pnpm run install:backend

Option A: Doppler (recommended)

doppler setup   # select project and dev config

Then run each service in a separate terminal:

# Terminal 1 – backend
pnpm run dev:backend

# Terminal 2 – frontend
pnpm run dev

# Terminal 3 – desktop app
pnpm run dev:desktop

# Terminal 4 – EchoPrism Agent
pnpm run dev:agent

# Terminal 5 – LiveKit voice agent
pnpm run dev:livekit-agent

Set NEXT_PUBLIC_ECHO_AGENT_URL (web) and VITE_ECHO_AGENT_URL (desktop) to http://localhost:8081 in Doppler for local agent access.

Option B: .env files

# Web app
cd apps/web && cp .env.local.example .env.local
# Edit .env.local with Firebase config and NEXT_PUBLIC_API_URL=http://localhost:8000

# Backend
cd backend && cp .env.example .env
# Edit .env with ECHO_GCP_PROJECT_ID, ECHO_GCS_BUCKET, GEMINI_API_KEY

Local URLs:

Environment Variables Reference:

Variable	Required	Description
`ECHO_GCP_PROJECT_ID`	Yes	GCP project ID
`ECHO_GCS_BUCKET`	Yes	GCS bucket name
`GEMINI_API_KEY`	Yes	Gemini API key
`NEXT_PUBLIC_API_URL`	Yes	Backend URL (web)
`NEXT_PUBLIC_ECHO_AGENT_URL`	Yes	EchoPrism Agent URL (web)
`NEXT_PUBLIC_FIREBASE_*`	Yes	Firebase config (web)
`VITE_API_URL`	Yes	Backend URL (desktop)
`VITE_ECHO_AGENT_URL`	Yes	EchoPrism Agent URL (desktop)
`LIVEKIT_URL`	Voice only	LiveKit server URL
`LIVEKIT_API_KEY`	Voice only	LiveKit API key
`LIVEKIT_API_SECRET`	Voice only	LiveKit API secret
`ECHO_CLOUD_RUN_REGION`	No	Default `us-central1`

See scripts/doppler-env-reference.md for the full reference.

Phase 6: Deploy to Cloud Run

pnpm run deploy
# or with explicit env:
GEMINI_API_KEY=your-key ECHO_GCS_BUCKET=your-bucket \
  ./scripts/deploy.sh YOUR_GCP_PROJECT_ID us-central1

The script builds and pushes Docker images, deploys frontend and backend as Cloud Run services, and deploys the EchoPrism agent as a Cloud Run Job.

To deploy the LiveKit voice agent (optional):

pnpm run deploy:livekit-agent

Requires LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET, LIVEKIT_AGENT_SECRET, ECHOPRISM_AGENT_URL, and GEMINI_API_KEY.

Usage

Visit the live demo to check out our web app. Make sure to follow the instructions in our releases page to ensure the desktop app can be ran.

Create a workflow — record a screen capture, describe steps via chat, or use voice on the desktop app
Edit steps — review and modify the auto-generated workflow steps in the dashboard
Run — trigger a run from the desktop app; EchoPrism executes each step via vision-language grounding
Monitor — watch the execution and click Ctrl + Shift + V to interrupt for user steering

Roadmap

Mobile app automation — Allow Echo to automate tasks on phones as well
Fine tuning — Improve model accuracy by training on user data with Vertex AI
Expanded integrations — Add third-party app connectors like Slack, Notion, and G-Suite
Workflow marketplace — Create a library of community-shared automations users can install and customize
Schedule workflows — Allow users to schedule workflows to run at specific times
Reduce costs — Optimize calls to OmniParser service to reduce time and monetary costs

See the open issues for a full list of proposed features and known issues.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also open an issue with the tag "enhancement". Don't forget to give the project a star!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Top Contributors

contrib.rocks image

License

Distributed under the MIT License. See LICENSE.txt for more information.

Contact

Jason Mun — jason.mun484@gmail.com · LinkedIn

Andrew Cheung — andrewcheung360@gmail.com · LinkedIn

Project Link: https://github.com/JasonMun7/echo

Acknowledgments

OmniParser — UI element grounding for vision-based automation
LiveKit — Real-time voice and video infrastructure
Gemini — Vision-language model powering EchoPrism
UI-TARS — GUI agent model for automated UI interaction
Best-README-Template

About

No description, website, or topics provided.

Report repository

Releases 5

Packages

Contributors

Languages