Refactor: thread-safe ErrorHint for GPU parallelism by sunt05 · Pull Request #1278 · UMEP-dev/SUEWS

sunt05 · 2026-04-03T18:48:05Z

Summary

Route all ErrorHint warning calls through modState%errorstate (thread-safe) instead of module-level SAVE variables
Remove supy_warning_count and supy_last_warning_message SAVE variables from module_ctrl_error_state
Add optional modState parameter to AerodynamicResistance, SurfaceResistance, and psyc_const; update all callers
Retain add_supy_warning as no-op stub for 10 call sites that don't yet have modState in scope

Test plan

make dev builds cleanly
make test-smoke passes (9/9)
make test passes (695/695)
Audit confirms all active ErrorHint calls (excluding dead code in bluews) pass modState

🤖 Generated with Claude Code

Route all ErrorHint warning calls through modState%errorstate (thread-safe) instead of module-level SAVE variables. This is a prerequisite for multi-grid parallelism via Rayon or GPU offloading. Changes: - Add optional modState parameter to AerodynamicResistance, SurfaceResistance, and psyc_const; pass through to ErrorHint calls - Fix 4 ErrorHint calls in sat_vap_press_x/sat_vap_pressIce that had modState in scope but were not passing it - Update all callers in suews_ctrl_driver to pass modState - Remove supy_warning_count and supy_last_warning_message SAVE variables - Remove module-level warning fallback in ErrorHint - Convert 2 RSLProfile add_supy_warning calls to modState%errorstate%report - Retain add_supy_warning as no-op stub for 10 call sites without modState Remaining SAVE: supy_error_flag/code/message (fatal path only — acceptable because fatal errors terminate the run). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-03T18:48:32Z

CI Build Plan

Changed Files

Fortran source (7 files)

src/suews/src/suews_ctrl_driver.f95
src/suews/src/suews_ctrl_error.f95
src/suews/src/suews_ctrl_type.f95
src/suews/src/suews_phys_lumps.f95
src/suews/src/suews_phys_resist.f95
src/suews/src/suews_phys_rslprof.f95
src/suews/src/suews_util_meteo.f95

Rust bridge (4 files)

src/suews_bridge/Cargo.lock
src/suews_bridge/Cargo.toml
src/suews_bridge/build.rs
src/suews_bridge/src/lib.rs

Python source (2 files)

src/supy/_run_rust.py
src/supy/_supy_module.py

Build Configuration

	Configuration
Platforms	Linux x86_64, macOS ARM64, Windows x64
Python	3.9, 3.14
Test tier	core (physics + smoke)
QGIS3 UMEP build	Yes (compiled extension may differ)
PR status	Draft (reduced matrix)

Rationale

Fortran source changed -> multiplatform build required
Rust bridge changed -> multiplatform build required
Python source changed -> single-platform build
Compiled extension ABI may differ -> QGIS3 UMEP (NumPy 1.x) build included

_{Updated by CI on each push. See path-filters.yml for category definitions.}

Eliminate per-grid overhead in run_suews_rust_multi: - Serialise config dict once, patch sites[] per grid (no deep copy) - Prepare forcing block once (shared across all grids) - Use json.dumps instead of yaml.dump (~30x faster serialisation; valid JSON parses as valid YAML via serde_yaml) Benchmark (20 grids x 576 timesteps): Before: 5.75s (0.287s/grid, 2005 grid-timesteps/s) After: 3.69s (0.184s/grid, 3126 grid-timesteps/s) Also add scripts/profile_multi_grid.py for profiling. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add process-based parallelism to run_suews_rust_multi using multiprocessing.Pool with spawn context (safe for Fortran SAVE). Thread serial_mode through run_suews_rust_chunked and _run_supy. Parallel mode is available but currently has high spawn overhead for short simulations. Best suited for long runs with many grids where per-grid compute time dominates process creation cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The thread-safe ErrorHint refactor (e560428) routed warnings through modState%errorstate%report(), which appends to a dynamically-growing array. Over a year-long simulation with frequent boundary-condition warnings, this caused unbounded memory growth and allocation overhead, timing out the Windows CI UMEP build. Cap non-fatal entries at 512; fatal entries always stored. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

M1 Max has 10 cores; 4 grids is sufficient for quick iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add run_suews_multi Rust function that uses Rayon par_iter to execute grid cells concurrently in shared memory (no IPC serialisation overhead). Changes: - Add rayon dependency to suews_bridge Cargo.toml - Add run_suews_multi PyO3 function: takes list of config JSONs + shared forcing, returns results from all grids in parallel - Add -frecursive to gfortran flags (Makefile.gfortran + build.rs) so concurrent Fortran calls each get their own stack frame - Python auto-detects run_suews_multi and uses it when serial_mode=False Benchmark (4 grids x 17520 timesteps, full year, M1 Max): Serial: 59.0s (14.75s/grid, 7146 grid-timesteps/s) Rayon: 33.6s (8.39s/grid, 12560 grid-timesteps/s) Speedup: 1.76x Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sunt05 marked this pull request as draft April 3, 2026 19:01

sunt05 and others added 6 commits April 3, 2026 20:05

Chore: default profiling script to 4 grids

3ba7190

M1 Max has 10 cores; 4 grids is sufficient for quick iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Chore: update Cargo.lock for rayon dependency

3adf817

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor: thread-safe ErrorHint for GPU parallelism#1278

Refactor: thread-safe ErrorHint for GPU parallelism#1278
sunt05 wants to merge 7 commits intomasterfrom
sunt05/gpu-accel-fortran

sunt05 commented Apr 3, 2026

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sunt05 commented Apr 3, 2026

Summary

Test plan

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI Build Plan

Changed Files

Build Configuration

Rationale

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 3, 2026 •

edited

Loading