Skip to content

Sprint about Dask computing summary #5

@QianqianHan96

Description

@QianqianHan96

The procedure to use Dask on snellius:

  1. Configuration: https://github.com/RS-DAT/JupyterDaskOnSLURM
  2. Dask script preparation: map_block() function
  3. Run dask script and monitor the running process in Dask dashboard.

Things might be helpful during using Dask:

  1. Put the reproject part and other preprocessing into another separate script, this make your main script run faster and look more clean.
  2. Chunk the data by space and time when load it, and make sure in every step, they have same chunk size. It is better to chunk the data as early as possible.
  3. If you load the trained model such as Machine Learning or Deep Learning model, make sure the model not so big, my trained model was 15 GB because I did not set max_depth when I trained Random Forest. If the model is too big, map_block() function can not handle it, you will get unexpected error. For example, my updated model is 245 MB, I can pass the model path to map_block() function, if I load the model outside of map_block() function and then pass it to map_block() function, the unmanaged memory is extremely high. Although loading the model outside of map_block() function is faster because you only load once, it throw unmanaged memory too high error, so we only can load the model inside map_block() function, in this way, it load the model for every chunk.
  4. When export to netcdf, use netcdf4, not netcdf3. Use xarray to export.
  5. Client(n_workers=4, threads_per_worker=1). More workers and more threads might make your script run faster. But if your data is too big, and if you set too many workers or threads_per_worker, the webpage might snap. This point I am still trying.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions