Skip to content

Add a variable for explicitly defining pool-size in bake_recipes#1989

Draft
Scott Wales (ScottWales) wants to merge 4 commits intoMetOffice:mainfrom
ScottWales:specify-cpus
Draft

Add a variable for explicitly defining pool-size in bake_recipes#1989
Scott Wales (ScottWales) wants to merge 4 commits intoMetOffice:mainfrom
ScottWales:specify-cpus

Conversation

@ScottWales
Copy link
Copy Markdown

Sites can define environment variable $BUNCH_POOL_SIZE in the bake_recipes task to explicitly set the number of parallel jobs to run rather than relying on the output of nprocs if the correct count is not reported.

Fixes #1988

Contribution checklist

Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.

  • Documentation has been updated to reflect change.
  • New code has tests, and affected old tests have been updated.
  • All tests and CI checks pass.
  • Ensured the pull request title is descriptive.
  • Ensure rose-suite.conf.example has been updated if new diagnostic added.
  • Conda lock files have been updated if dependencies have changed.
  • Attributed any Generative AI, such as GitHub Copilot, used in this PR.
  • Marked the PR as ready to review.

Copy link
Copy Markdown
Member

@jfrost-mo James Frost (jfrost-mo) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sensible. I feel we should document this somewhere, but off the top of my head we don't really have anywhere suitable. Maybe we should pass it through the rose-edit GUI, so we can have help text there?

@ScottWales
Copy link
Copy Markdown
Author

I don't think the rose metadata is the place to document it. Probably a site would define this in the site file based on the number of CPUs being allocated, e.g.

[[bake_recipes]]
    [[[directives]]]
        -l ncpus = 10
    [[[environment]]]
        BUNCH_POOL_SIZE = 10

Maybe best to describe this in the sample localhost site, or in a README in the site directory explaining how to set up CSET at new sites?

@jfrost-mo
Copy link
Copy Markdown
Member

Hmm, perhaps in the adding a new site documentation then? https://metoffice.github.io/CSET/usage/add-site.html#add-site-file

@jfrost-mo
Copy link
Copy Markdown
Member

I've added a bit of documentation to the add a site page:

Recipes are baked in parallel with a parallel job per detected CPU. If this is detected incorrectly, or you want to undersubscribe nodes for additional memory headroom, you may set the number of parallel jobs with the BUNCH_POOL_SIZE environment variable.

If you're happy with this please feel free to merge this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bake_recipes not recognising parallel cpus at NCI

2 participants