Control idle time on SLURMCluster.adapt() to control when workers are released.

The documentation gives the following example for using adapt and managing the lifetime of the workers:

`cluster = Cluster(
    walltime="01:00:00",
    cores=4,
    memory="16gb",
    worker_extra_args=["--lifetime", "55m", "--lifetime-stagger", "4m"],
)

cluster.adapt(minimum=0, maximum=200)
`

However, when using this on a SLURM HPC the behavior seems to be to release all the workers once the current task is done. Subsequent tasks will then requeue SLURM, which seems to lead to constant queuing and releasing of workers in my workflow.

I am hoping to accomplish something like the following:

`complex xarray computation 1

complex xarray computation 2

complex xarray computation 3

...

complex xarray computation N
`
but without releasing all the workers between serial executions. Is there currently a mechanism to complete a sequence of dask delayed tasks without releasing the workers gained from adapt()?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Control idle time on SLURMCluster.adapt() to control when workers are released. #701

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Control idle time on SLURMCluster.adapt() to control when workers are released. #701

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions