Zoom Communication Interface for Remote Human-Robot Interaction

A toolkit for remote human-robot interaction through Zoom video conferencing, powered by Large Language Models. This system serves as a core component of the Robi Butler project (ICRA 2025 paper).

Overview

This is a minimal implementation enabling remote human-robot interaction through Zoom meetings with:

Zoom Interface: Bidirectional communication via Zoom chat and voice (to output robot voice, please use ElevenLabs to generate the voice, and create a virtual microphone to output the voice to the system)
Robi Butler: Intelligent agent for instruction processing and robot control
LLM Planner: GPT-4 powered natural language understanding
Fetch Agent: Robot control interface with manipulation primitives
Pointing Server (optional): Web-based pointing/gesture input

Data Flow

User → Robot:
- User speaks/types "Robi, pick up the cup from the table" in Zoom
- zoom_interface.py detects wake word → publishes to /user_instruction
- robi_butler.py receives message → calls ChatAgent.get_response()
- LLM generates: [move("table"), pick("cup")]
- Butler executes via fetch_agent → actions sent to robot
- Butler publishes result to /robot_feedback
- zoom_interface.py writes response to Zoom chat
Pointing (optional):
- User clicks on web interface (via ngrok URL)
- server.py publishes Point32(x, y) to /user_point
- robi_butler.py stores point in buffer
- User says "Robi, pick this*"
- LLM generates: [pick("*")]
- Butler uses stored point coordinates for action

Project Structure

zoom_communication/
├── robi_butler.py          # Main agent: instruction processing & execution
├── zoom_interface.py       # Zoom bridge: chat/voice ↔ ROS topics
├── llm_planner.py         # ChatAgent: LLM-based planning
├── fetch_agent.py         # Robot interface: action primitives
├── robots/
│   └── fake_robot.py      # Simulated robot with scene graph
├── webgesture/server/
│   └── server.py          # Pointing server (Flask)
├── requirements.txt       # Python dependencies
├── .env.example          # Environment template
├── .gitignore
├── LICENSE
└── geckodriver           # Firefox WebDriver

ROS Topics

Topic	Type	Publisher	Subscriber	Description
`/user_instruction`	String	zoom_interface.py	robi_butler.py	User commands
`/robot_feedback`	String	robi_butler.py	zoom_interface.py	Robot responses
`/user_point`	Point32	server.py	robi_butler.py	Pointing coordinates

Installation

Prerequisites

Python 3.8+
ROS Noetic (or later)
Firefox browser
OpenAI API key (set the system variable OPENAI_API_KEY, or put it in the .env file)
Geckodriver (already included in the repository or download from releases)

Install system dependencies:

sudo apt-get install portaudio19-dev python3-pyaudio

Setup

Install Python dependencies:
```
pip install -r requirements.txt
```

Configure environment variables:

cp .env.example .env
nano .env

Edit .env:

OPENAI_API_KEY=sk-your-openai-api-key-here #(set the system variable `OPENAI_API_KEY`, or put it in the `.env` file)
ZOOM_MEETING_URL=https://zoom.us/j/YOUR_MEETING_ID?pwd=YOUR_PASSWORD # please put the full URL of the Zoom meeting, get it from the Zoom meeting invite email

Make geckodriver executable:
```
chmod +x geckodriver
```

Usage

Start the System (3-4 terminals)

Terminal 1 - ROS Core:

roscore

Terminal 2 - Zoom Interface:

python3 zoom_interface.py

Opens Firefox and joins Zoom meeting (join from browser)
Listens for chat messages and voice commands (mute the robot microphone, open the chat box)
Publishes to /user_instruction
Writes responses from /robot_feedback to Zoom

Terminal 3 - Robi Butler:

python3 robi_butler.py

Subscribes to /user_instruction and /user_point
Processes commands through LLM
Executes robot actions
Publishes to /robot_feedback

Terminal 4 (Optional) - Pointing Server:

cd webgesture/server
python3 server.py

# In another terminal, expose with ngrok
ngrok http 8888

Provides web interface for pointing
Share ngrok URL with users
Publishes clicks to /user_point

Example Commands

Voice/Chat (say "Robi" to activate):

Robi, move to the kitchen
Robi, pick up the cup from the table
Robi, open the fridge and check if there is pizza
Robi, what do you see?

With Pointing (use "this"):

Robi, pick this          [user clicks on object]
Robi, what is this?      [user clicks on object]
Robi, move to this       [user clicks on location]

Troubleshooting

Zoom not connecting:

Verify ZOOM_MEETING_URL in .env
Check Firefox is installed
Wait 35 seconds for page load
Check geckodriver: chmod +x geckodriver

No microphone input:

# List devices
python3 -c "import speech_recognition as sr; print(sr.Microphone.list_microphone_names())"

# Set device in zoom_interface.py
self.mic_device = 1  # your device index

OpenAI API errors:

# Verify key
echo $OPENAI_API_KEY

# Or check .env file
cat .env | grep OPENAI_API_KEY

ROS topics not working:

# Check topics
rostopic list

# Check connections
rostopic info /user_instruction

# Test manually
rostopic pub /user_instruction std_msgs/String "test"

ModuleNotFoundError:

pip install -r requirements.txt

# ROS packages (if missing)
sudo apt-get install ros-noetic-cv-bridge ros-noetic-geometry-msgs

License

MIT License - see LICENSE file for details

Citation

If you use this system in your research, please cite:

@inproceedings{xiao2025robi,
  title={Robi Butler: Multimodal Remote Interaction with a Household Robot Assistant},
  author={Xiao, Anxing and Janaka, Nuwan and Hu, Tianrun and Gupta, Anshul and Li, Kaixin and Yu, Cunjun and Hsu, David},
  booktitle={2025 IEEE International Conference on Robotics and Automation (ICRA)},
  pages={4337--4344},
  year={2025},
  organization={IEEE}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zoom Communication Interface for Remote Human-Robot Interaction

Overview

Data Flow

Project Structure

ROS Topics

Installation

Prerequisites

Setup

Usage

Start the System (3-4 terminals)

Example Commands

Troubleshooting

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
imgs		imgs
robots		robots
webgesture		webgesture
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
fetch_agent.py		fetch_agent.py
geckodriver		geckodriver
llm_planner.py		llm_planner.py
requirements.txt		requirements.txt
robi_butler.py		robi_butler.py
zoom_interface.py		zoom_interface.py

Folders and files

Latest commit

History

Repository files navigation

Zoom Communication Interface for Remote Human-Robot Interaction

Overview

Data Flow

Project Structure

ROS Topics

Installation

Prerequisites

Setup

Usage

Start the System (3-4 terminals)

Example Commands

Troubleshooting

License

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages