Skip to content

Multi-tower inference: remote command delegation - Windows and cuda support - end to end installation fixes#315

Open
joelteply wants to merge 39 commits intomainfrom
feature/multi-tower-inference
Open

Multi-tower inference: remote command delegation - Windows and cuda support - end to end installation fixes#315
joelteply wants to merge 39 commits intomainfrom
feature/multi-tower-inference

Conversation

@joelteply
Copy link
Contributor

Summary

  • Automated Continuum install for all platforms (macOS, Ubuntu/Debian, WSL2)
  • GPU detection (CUDA, Metal, CPU-only) with appropriate ML package selection
  • Tower-aware command routing (upcoming): Commands.execute() transparently delegates to remote towers
  • Tower URI scheme: tower://5090 local → mesh address long-term

Test plan

  • Install script works on 5090 WSL2
  • npm start succeeds on 5090
  • ./jtag ping responds from 5090
  • Academy benchmark routes through remote tower

🤖 Generated with Claude Code

Cross-platform install (macOS, Ubuntu/Debian, WSL2):
- GPU detection (CUDA, Metal, CPU-only)
- System deps, Node.js 22, Rust via rustup
- Python venv with ML packages (torch, transformers, peft) when GPU present
- npm install + TypeScript build + Rust worker build
- Default config.env creation
- Idempotent (safe to re-run)

Usage: git clone ... && cd continuum/src && bash scripts/install.sh
Copilot AI review requested due to automatic review settings March 20, 2026 22:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new one-shot installer script intended to bootstrap a “Continuum Tower” environment across macOS/Linux/WSL by installing toolchains, optionally setting up a Python ML venv based on GPU detection, building the project, and writing a default user config.

Changes:

  • Introduces src/scripts/install.sh to install system dependencies, Node.js, Rust, and (optionally) a Python ML venv.
  • Adds basic GPU detection (CUDA vs Metal vs CPU-only) to decide which PyTorch wheel index to use.
  • Runs the repo build (npm install, npm run build:ts, scripts/setup-rust.sh) and creates a default ~/.continuum/config.env if missing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +234 to +240
npm install --silent 2>&1 | tail -3

echo -e " Building TypeScript..."
npm run build:ts 2>&1 | tail -1

echo -e " Building Rust workers..."
bash scripts/setup-rust.sh 2>&1 | tail -5
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These build steps pipe output to tail, which will mask failures unless pipefail is enabled (and even then truncates the logs needed to debug). Consider removing the | tail ... truncation or capturing logs to a file while still letting failures surface clearly.

Suggested change
npm install --silent 2>&1 | tail -3
echo -e " Building TypeScript..."
npm run build:ts 2>&1 | tail -1
echo -e " Building Rust workers..."
bash scripts/setup-rust.sh 2>&1 | tail -5
npm install --silent > npm_install.log 2>&1
tail -3 npm_install.log
echo -e " Building TypeScript..."
npm run build:ts > build_ts.log 2>&1
tail -1 build_ts.log
echo -e " Building Rust workers..."
bash scripts/setup-rust.sh > setup_rust.log 2>&1
tail -5 setup_rust.log

Copilot uses AI. Check for mistakes.
Comment on lines +150 to +151
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux/WSL the script uses sudo unconditionally for NodeSource setup + apt install. This will fail when running as root (or in minimal environments without sudo). You already compute a SUDO variable in install_system_deps; reuse that pattern here or detect id -u before calling sudo.

Suggested change
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
# Determine whether to use sudo (for non-root users with sudo installed)
local SUDO_CMD="sudo"
if [ "$(id -u)" -eq 0 ] || ! command -v sudo >/dev/null 2>&1; then
SUDO_CMD=""
fi
if [ -n "$SUDO_CMD" ]; then
curl -fsSL https://deb.nodesource.com/setup_22.x | sudo -E bash -
sudo apt-get install -y nodejs
else
curl -fsSL https://deb.nodesource.com/setup_22.x | bash -
apt-get install -y nodejs
fi

Copilot uses AI. Check for mistakes.
echo -e "${BLUE}━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━${NC}"
echo ""

PLATFORM=$(preflight_detect_platform)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preflight_detect_platform can return unknown, but most case "$PLATFORM" statements have no default branch and still print success messages. This can lead to a confusing partial install on unsupported systems; consider explicitly erroring out if PLATFORM is unknown (or adding *) cases that exit 1 with a clear message).

Suggested change
PLATFORM=$(preflight_detect_platform)
PLATFORM=$(preflight_detect_platform)
if [ "$PLATFORM" = "unknown" ]; then
echo -e " Platform: ${RED}unknown${NC}"
echo ""
echo -e "${RED}Error:${NC} Unsupported platform detected by preflight_detect_platform."
echo "This installer currently supports: macOS (Apple Silicon), Ubuntu/Debian (x86_64), and WSL2."
echo "If you're on a supported system and see this message, please update your scripts or open an issue."
exit 1
fi

Copilot uses AI. Check for mistakes.
# Installs: Node.js, Rust, Python venv (with ML packages if GPU detected), system deps
# Idempotent: safe to run multiple times (skips what's already installed)

set -e
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script pipes several critical build/install commands into tail later on, but only set -e is enabled. In bash, a failing command in a pipeline won't fail the script unless set -o pipefail is set, so npm install / npm run build:ts / setup-rust.sh failures could be silently ignored.

Suggested change
set -e
set -e
set -o pipefail

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +111
# Python venv support
if ! python3 -m venv --help &>/dev/null 2>&1; then
# Detect python version for correct package name
local pyver=$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>/dev/null || echo "3.12")
needed+=("python${pyver}-venv")
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux/WSL system deps don't ensure python3 is installed, but later steps call python3 -m venv unconditionally. On minimal Debian/Ubuntu images this will fail even if pythonX.Y-venv is installed; consider adding python3 (and usually python3-pip) to the needed list when python3 isn't present.

Suggested change
# Python venv support
if ! python3 -m venv --help &>/dev/null 2>&1; then
# Detect python version for correct package name
local pyver=$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>/dev/null || echo "3.12")
needed+=("python${pyver}-venv")
# Python + venv support
if command -v python3 &>/dev/null; then
# Ensure venv module is available for the installed python3
if ! python3 -m venv --help &>/dev/null 2>&1; then
# Detect python version for correct package name
local pyver
pyver=$(python3 -c "import sys; print(f'{sys.version_info.major}.{sys.version_info.minor}')" 2>/dev/null || echo "3.12")
needed+=("python${pyver}-venv")
fi
else
# No python3 installed: install interpreter, pip, and generic venv package
needed+=("python3" "python3-pip" "python3-venv")

Copilot uses AI. Check for mistakes.
Comment on lines +84 to +92
macos)
preflight_check_build_tools
# Homebrew packages
for pkg in jq git; do
if ! command -v "$pkg" &>/dev/null; then
echo -e " Installing $pkg..."
brew install "$pkg"
fi
done
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On macOS this assumes Homebrew is available (brew install ...) but there's no check or guidance if brew is missing. Consider checking command -v brew and either installing it or printing a clear instruction before attempting installs.

Copilot uses AI. Check for mistakes.
Comment on lines +216 to +217
$pip install -q transformers peft accelerate datasets trl bitsandbytes pytest

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bitsandbytes is installed for both CUDA and Metal paths, but it is generally CUDA-specific and commonly fails to install/import on macOS/MPS environments. This will likely break the install on Apple Silicon; consider only installing bitsandbytes when HAS_CUDA=true (and skipping it for Metal/CPU).

Suggested change
$pip install -q transformers peft accelerate datasets trl bitsandbytes pytest
$pip install -q transformers peft accelerate datasets trl pytest
if $HAS_CUDA; then
echo -e " Installing CUDA-specific packages..."
$pip install -q bitsandbytes
fi

Copilot uses AI. Check for mistakes.
Comment on lines +2 to +12
# Continuum Tower Install — One command to get a tower running.
#
# Usage:
# git clone https://github.com/CambrianTech/continuum.git
# cd continuum/src
# bash scripts/install.sh
#
# Works on: macOS (Apple Silicon), Ubuntu/Debian (x86_64), WSL2
# Installs: Node.js, Rust, Python venv (with ML packages if GPU detected), system deps
# Idempotent: safe to run multiple times (skips what's already installed)

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focus on multi-tower inference and remote command delegation, but the only change in this PR is adding an install script. Either the PR metadata should be updated to reflect the installer work, or the missing multi-tower changes should be included so the PR matches its stated purpose.

Copilot uses AI. Check for mistakes.
Comment on lines +251 to +265
echo -e "\n${YELLOW}Creating default config at $CONFIG_FILE${NC}"
cat > "$CONFIG_FILE" << 'ENVEOF'
# Continuum Tower Configuration
# Add API keys here for cloud provider access.
# Uncomment and fill in the ones you need:

# ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-proj-...
# DEEPSEEK_API_KEY=sk-...
# HF_TOKEN=hf_...

# Storage path for models, adapters, datasets (default: ~/.continuum)
# CONTINUUM_STORAGE_PATH=/path/to/storage
ENVEOF
echo -e " ${YELLOW}Edit $CONFIG_FILE to add your API keys${NC}"
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installer writes its own minimal ~/.continuum/config.env template, but the repo already has scripts/ensure-config.ts as the canonical config generator (with required defaults like HTTP_PORT/WS_PORT, LOG_LEVEL, etc.). Creating a partial file here will prevent ensure-config.ts from generating the full default config and can leave the system missing expected keys; consider invoking the existing ensure-config script (or reusing its template) instead of writing a second format.

Suggested change
echo -e "\n${YELLOW}Creating default config at $CONFIG_FILE${NC}"
cat > "$CONFIG_FILE" << 'ENVEOF'
# Continuum Tower Configuration
# Add API keys here for cloud provider access.
# Uncomment and fill in the ones you need:
# ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-proj-...
# DEEPSEEK_API_KEY=sk-...
# HF_TOKEN=hf_...
# Storage path for models, adapters, datasets (default: ~/.continuum)
# CONTINUUM_STORAGE_PATH=/path/to/storage
ENVEOF
echo -e " ${YELLOW}Edit $CONFIG_FILE to add your API keys${NC}"
echo -e "\n${YELLOW}No config found at $CONFIG_FILE — generating default config via ensure-config${NC}"
if command -v npm >/dev/null 2>&1; then
(
cd "$PROJECT_DIR"
# Use the canonical config generator to create a full default config.env
if ! npm run ensure-config >/dev/null 2>&1; then
echo -e " ${YELLOW}Warning:${NC} 'npm run ensure-config' failed. You may need to run it manually after installation."
fi
)
else
echo -e " ${YELLOW}Warning:${NC} npm is not available. Please run 'npm run ensure-config' from the src directory to create $CONFIG_FILE."
fi

Copilot uses AI. Check for mistakes.
echo -e " Python ML: $VENV_DIR"
fi
echo ""
echo -e " ${YELLOW}Start:${NC} cd src && npm start"
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post-install instruction cd src && npm start conflicts with the usage instructions at the top (cd continuum/src before running this script). If the user is already in src/, cd src will land in src/src and fail; consider printing either just npm start or an unambiguous path from the repo root.

Suggested change
echo -e " ${YELLOW}Start:${NC} cd src && npm start"
echo -e " ${YELLOW}Start:${NC} npm start"

Copilot uses AI. Check for mistakes.
- Guard against running as root (sudo puts config under /root)
- Node.js via nvm instead of nodesource apt repo (no sudo needed)
- Only apt-get install uses sudo elevation
- cargo-features.sh: shared helper detects GPU (Metal on macOS, CUDA on
  Linux/WSL via nvidia-smi, CPU-only fallback). Future: ROCm for AMD.
- parallel-start.sh: uses --no-default-features + detected GPU features
  instead of hardcoded default=["metal"] (which fails on Linux with objc2)
- setup-rust.sh: same feature detection for per-worker builds
- Source ~/.cargo/env and nvm in parallel-start for non-interactive shells
  (SSH sessions don't load .bashrc)
accelerate-src links Apple's Accelerate.framework — fails on Linux.
Now: metal+accelerate on macOS, cuda on Linux, cpu-only fallback.
Workspace deps no longer hardcode accelerate feature.
Removes --no-default-features (caused rusqlite/regex unresolved on Linux).
default=[] means cargo build alone gets CPU-only, scripts always pass
--features metal,accelerate (macOS) or --features cuda (Linux).
models/ in .gitignore was root-relative-less, catching ALL models/ dirs
including continuum-core/src/models/ (Rust source). Changed to /models/
(root-relative) to only match the top-level ML model downloads.

The models module (352 lines, model discovery via provider APIs) was
never committed — only existed locally on macOS.
…ly target

These deps were accidentally placed under [target.'cfg(target_os = "macos")'.dependencies]
instead of [dependencies]. Worked on macOS, broke on every other platform with 69 errors.
… GpuConvertPlugin

- gpu_bridge: cross-platform stub (has_bridge → false on non-macOS)
- release_unused_buffers: #[cfg(target_os = "macos")] gate
- GpuConvertPlugin: platform-conditional plugin (no-op on non-macOS)
  GPU→GPU rendering still active on macOS, falls back to CPU readback on Linux
Mirrors cargo-features.sh logic in TypeScript — passes --features
metal,accelerate (macOS), cuda (Linux+NVIDIA), or nothing (CPU-only)
to cargo test. Without this, ts-rs export tests fail on non-macOS
because they try to compile with default (empty) features.

Also reverted accidental commit of 262 generated .ts files — these
must be generated by the build pipeline, not checked in.
Fixes protobuf descriptor crash on Linux — webrtc-sys and ort-sys
both statically linked protobuf, causing duplicate registration at
runtime. load-dynamic loads libonnxruntime.so at runtime via dlopen,
avoiding the static protobuf conflict entirely.

Removed --allow-multiple-definition linker hack (no longer needed).
@joelteply joelteply changed the title Multi-tower inference: remote command delegation Multi-tower inference: remote command delegation - Windows and cuda support Mar 20, 2026
@joelteply joelteply changed the title Multi-tower inference: remote command delegation - Windows and cuda support Multi-tower inference: remote command delegation - Windows and cuda support - end to end installation fixes Mar 20, 2026
Changed build/ to /build/ (root-relative) so command subdirectories
named "build" are tracked. Same pattern as the models/ fix.
Linux strictly enforces ulimit -v (kills process on exceed).
CUDA runtime + WebRTC + ONNX need 8-16GB virtual memory.
macOS doesn't enforce it anyway, so keep as soft hint there only.
- SystemPaths.ts: use $USER env var for Postgres URL fallback
- parallel-start.sh: use $USER for psql health checks
- More hardcoded references remain in seed data/tests (separate cleanup)
- ZERO "joel" references remaining (was 59 across 36 files)
- ZERO "FlashGordon" drive paths
- Database configs use $USER env var, not hardcoded username
- Seed data uses "owner" as generic primary human identifier
- Test fixtures use "test-user" / "user-owner-00001"
- Shell scripts use ${USER} for Postgres operations
- Rust binaries use /tmp as HOME fallback
- Shared code does NOT use process.env (browser-safe)
- Fixed per-package GPU features (accelerate only for continuum-core)

Runtime identity should come from onboarding, not source code.
load-dynamic broke TTS/STT on macOS (ORT panic — no libonnxruntime.dylib).
macOS uses download-binaries (static linking, works with macOS linker).
Linux uses load-dynamic-ort feature flag to avoid protobuf conflict with webrtc-sys.
…bient light

- Reverted gpu_convert_plugin() abstraction — caused Bevy animation freeze
  on macOS. Root cause unclear but reverting fixed it immediately.
- gpu_bridge stub and Metal cfg gates kept for Linux compilation.
- Added AmbientLight (brightness 500) for base face illumination.
- Key light moved to top-right-front, fill boosted to 15k from left.
GPU readback returns corrupted data (green/noise artifacts) before the
Metal render pipeline is fully warmed up. Skip first 30 ReadbackComplete
callbacks (~2s at 15fps) per slot before publishing frames to WebRTC.
Zero-cost check (integer comparison only, no analysis).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants