-
-
Notifications
You must be signed in to change notification settings - Fork 756
Closed
Labels
Description
Describe the issue:
When creating a LocalCluster with more than 5 workers, all workers are successfully created and initially register with the scheduler, but client.scheduler_info()['workers'] only returns 5 workers. The remaining workers fail with CommClosedError immediately after registration and are removed from the scheduler.
This issue occurs regardless of the total number of workers requested (tested with 6, 8, 10+ workers) - only 5 workers ever appear in client.scheduler_info()['workers']. The issue occurs with both processes=True and processes=False.
Minimal Complete Verifiable Example:
import time
from distributed import Client, LocalCluster
if __name__ == "__main__":
# Request 8 threaded workers
cluster = LocalCluster(
n_workers=8,
threads_per_worker=1,
processes=False,
silence_logs=False
)
client = Client(cluster)
print(f"Requested workers: 8")
print(f"Scheduler address: {client.scheduler_info()['address']}")
# Wait for workers to start
time.sleep(3)
workers = client.scheduler_info()['workers']
print(f"Actual workers connected: {len(workers)}")
print(f"Worker IDs: {sorted([w['id'] for w in workers.values()])}")
for worker_key, worker_info in workers.items():
print(f" - Worker ID: {worker_info['id']} | Threads: {worker_info['nthreads']}")
client.close()
cluster.close()Expected output:
Requested workers: 8
Actual workers connected: 8
Worker IDs: [0, 1, 2, 3, 4, 5, 6, 7]
Actual output:
Requested workers: 8
Actual workers connected: 5
Worker IDs: [Only 5 workers appear, specific IDs vary]
- (5 workers shown, each with 1 thread)
Anything else we need to know?:
- When examining logs with
silence_logs=False, all 8 workers are created and register with the scheduler, but 3 workers immediately encounterCommClosedErrorexceptions and are removed from the scheduler - The specific workers that fail varies between
processes=Trueandprocesses=False, but exactly 3 workers always fail, leaving 5 connected client.wait_for_workers(n_workers=8)returns successfully, claiming all 8 workers are ready, butclient.scheduler_info()['workers']only shows 5 workers- The issue occurs with both
processes=Trueandprocesses=Falseconfigurations - Tested on two completely different systems (macOS ARM and Linux x86_64) with identical results
- The limitation appears to be hardcoded at 5 workers - requesting 6, 8, 10, or 20 workers always results in exactly 5 workers being reported by
scheduler_info() - Also tested with Dask version 2025.4.1 on both Python 3.13.3 and 3.12.12 - the issue persists in that version as well
Environment:
macOS System:
- Dask version: 2026.1.2
- Python version: 3.13.3 and 3.12.12 (both exhibit the same issue)
- Operating System: macOS (Apple M2 Pro, 10 cores)
- Install method:
pip install "dask[complete]"
Linux System:
- Dask version: 2026.1.2
- Python version: 3.13.3 and 3.12.12 (both exhibit the same issue)
- Operating System: Linux (AMD EPYC 7702 64-Core Processor, 256 CPUs)
- Install method:
pip install "dask[complete]"
Key dependencies installed:
bokeh 3.8.2
cloudpickle 3.1.2
distributed 2026.1.2
fsspec 2026.1.0
lz4 4.4.5
msgpack 1.1.2
numpy 2.4.2
pandas 3.0.0
psutil 7.2.2
tornado 6.5.4
Reactions are currently unavailable