Unified compute fabric across ACTIVATE resources using Ray. Deploys a Ray head node on any resource, connects workers from one or more additional sites via SSH tunnels, and provides a live dashboard showing cluster topology and task placement.
ACTIVATE Workflow
|
+----------+----------+------------ ...
v v v
Site 1 Site 2 Site N
Ray HEAD Ray WORKER Ray WORKER
Dashboard (SSH/SLURM) (SSH/SLURM)
| | |
+--- Ray Cluster -----+
|
Workload Options:
- Fractal rendering (visual)
- Mathematical benchmark (charts)
- Cluster only (bring your own workload)
- Add this workflow to your ACTIVATE account
- Configure:
- Head Node: Select any resource — runs the Ray coordinator + dashboard (no compute)
- Compute Workers: Add one or more worker sites (SSH, SLURM, or PBS resources)
- Workload: Choose fractal rendering, benchmark, or cluster-only mode
- Click Execute
- Open the session link to view the live dashboard
Distributes Mandelbrot tile rendering across all sites. Tiles appear live on a canvas, color-coded by which site rendered them.
Three phases:
- Task Throughput — Bursts many small tasks to measure scheduling rate and placement distribution
- CPU Compute — Matrix multiplications (NumPy) measuring GFLOPS per node
- Scaling Test — Compares multi-site vs single-site throughput
Deploys the Ray cluster with no built-in workload. The dashboard shows connection instructions with copy-paste commands for SSH tunnels, Ray job submission, and direct head node access. Use this mode to run your own Ray jobs, training scripts, or interactive workloads.
Each worker node registers 1 task slot with Ray. Ray handles placement across nodes; tasks use internal parallelism (OpenMP, MPI, PyTorch, etc.) for multi-core/GPU work within each node.
Worker dispatch modes:
- SSH: Direct connection to the remote host (single node per site)
- SLURM: Submit via
srunwith configurable partition, account, QoS, nodes, and walltime - PBS: Submit via
qsubwith configurable directives
Cross-site workers connect through SSH tunnels with unique loopback IPs (127.0.X.Y) for multi-node support.
- Cluster tab — Node topology grouped by site, task placement bar chart, throughput over time
- Connect tab — SSH tunnel commands, Python examples, Ray Jobs CLI, cluster info table (cluster_only mode)
- Ray Dashboard tab — Proxied Ray native dashboard (port 8265)
ray-cluster/
├── workflow.yaml # Multi-site workflow definition
├── README.md
├── ROADMAP.md # Future improvements and priorities
├── thumbnail.png
└── scripts/
├── setup.sh # Install Ray + NumPy via uv/pip (handles old Python)
├── start_ray_head.sh # Start Ray head (--num-cpus=0) + dashboard
├── dispatch_workers.sh # Connect workers from all sites (SSH/SLURM/PBS)
├── run_benchmark.sh # Run benchmark, POST results to dashboard
├── benchmark.py # Ray distributed benchmark + fractal tasks
├── dashboard.py # FastAPI live dashboard server (WebSocket)
├── diagnose.sh # Cluster diagnostic tool
├── diagnose_cluster.py # Ray cluster health checker
└── templates/
└── index.html # Dashboard UI