Skip to content

Refactor: thread-safe ErrorHint for GPU parallelism#1278

Draft
sunt05 wants to merge 7 commits intomasterfrom
sunt05/gpu-accel-fortran
Draft

Refactor: thread-safe ErrorHint for GPU parallelism#1278
sunt05 wants to merge 7 commits intomasterfrom
sunt05/gpu-accel-fortran

Conversation

@sunt05
Copy link
Copy Markdown

@sunt05 sunt05 commented Apr 3, 2026

Summary

  • Route all ErrorHint warning calls through modState%errorstate (thread-safe) instead of module-level SAVE variables
  • Remove supy_warning_count and supy_last_warning_message SAVE variables from module_ctrl_error_state
  • Add optional modState parameter to AerodynamicResistance, SurfaceResistance, and psyc_const; update all callers
  • Retain add_supy_warning as no-op stub for 10 call sites that don't yet have modState in scope

Test plan

  • make dev builds cleanly
  • make test-smoke passes (9/9)
  • make test passes (695/695)
  • Audit confirms all active ErrorHint calls (excluding dead code in bluews) pass modState

🤖 Generated with Claude Code

Route all ErrorHint warning calls through modState%errorstate (thread-safe)
instead of module-level SAVE variables. This is a prerequisite for multi-grid
parallelism via Rayon or GPU offloading.

Changes:
- Add optional modState parameter to AerodynamicResistance, SurfaceResistance,
  and psyc_const; pass through to ErrorHint calls
- Fix 4 ErrorHint calls in sat_vap_press_x/sat_vap_pressIce that had modState
  in scope but were not passing it
- Update all callers in suews_ctrl_driver to pass modState
- Remove supy_warning_count and supy_last_warning_message SAVE variables
- Remove module-level warning fallback in ErrorHint
- Convert 2 RSLProfile add_supy_warning calls to modState%errorstate%report
- Retain add_supy_warning as no-op stub for 10 call sites without modState

Remaining SAVE: supy_error_flag/code/message (fatal path only — acceptable
because fatal errors terminate the run).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

CI Build Plan

Changed Files

Fortran source (7 files)

  • src/suews/src/suews_ctrl_driver.f95
  • src/suews/src/suews_ctrl_error.f95
  • src/suews/src/suews_ctrl_type.f95
  • src/suews/src/suews_phys_lumps.f95
  • src/suews/src/suews_phys_resist.f95
  • src/suews/src/suews_phys_rslprof.f95
  • src/suews/src/suews_util_meteo.f95

Rust bridge (4 files)

  • src/suews_bridge/Cargo.lock
  • src/suews_bridge/Cargo.toml
  • src/suews_bridge/build.rs
  • src/suews_bridge/src/lib.rs

Python source (2 files)

  • src/supy/_run_rust.py
  • src/supy/_supy_module.py

Build Configuration

Configuration
Platforms Linux x86_64, macOS ARM64, Windows x64
Python 3.9, 3.14
Test tier core (physics + smoke)
QGIS3 UMEP build Yes (compiled extension may differ)
PR status Draft (reduced matrix)

Rationale

  • Fortran source changed -> multiplatform build required
  • Rust bridge changed -> multiplatform build required
  • Python source changed -> single-platform build
  • Compiled extension ABI may differ -> QGIS3 UMEP (NumPy 1.x) build included

Updated by CI on each push. See path-filters.yml for category definitions.

@sunt05 sunt05 marked this pull request as draft April 3, 2026 19:01
sunt05 and others added 6 commits April 3, 2026 20:05
Eliminate per-grid overhead in run_suews_rust_multi:
- Serialise config dict once, patch sites[] per grid (no deep copy)
- Prepare forcing block once (shared across all grids)
- Use json.dumps instead of yaml.dump (~30x faster serialisation;
  valid JSON parses as valid YAML via serde_yaml)

Benchmark (20 grids x 576 timesteps):
  Before: 5.75s (0.287s/grid, 2005 grid-timesteps/s)
  After:  3.69s (0.184s/grid, 3126 grid-timesteps/s)

Also add scripts/profile_multi_grid.py for profiling.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add process-based parallelism to run_suews_rust_multi using
multiprocessing.Pool with spawn context (safe for Fortran SAVE).
Thread serial_mode through run_suews_rust_chunked and _run_supy.

Parallel mode is available but currently has high spawn overhead
for short simulations. Best suited for long runs with many grids
where per-grid compute time dominates process creation cost.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The thread-safe ErrorHint refactor (e560428) routed warnings through
modState%errorstate%report(), which appends to a dynamically-growing
array. Over a year-long simulation with frequent boundary-condition
warnings, this caused unbounded memory growth and allocation overhead,
timing out the Windows CI UMEP build.

Cap non-fatal entries at 512; fatal entries always stored.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
M1 Max has 10 cores; 4 grids is sufficient for quick iteration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add run_suews_multi Rust function that uses Rayon par_iter to execute
grid cells concurrently in shared memory (no IPC serialisation overhead).

Changes:
- Add rayon dependency to suews_bridge Cargo.toml
- Add run_suews_multi PyO3 function: takes list of config JSONs + shared
  forcing, returns results from all grids in parallel
- Add -frecursive to gfortran flags (Makefile.gfortran + build.rs) so
  concurrent Fortran calls each get their own stack frame
- Python auto-detects run_suews_multi and uses it when serial_mode=False

Benchmark (4 grids x 17520 timesteps, full year, M1 Max):
  Serial:   59.0s (14.75s/grid, 7146 grid-timesteps/s)
  Rayon:    33.6s (8.39s/grid, 12560 grid-timesteps/s)
  Speedup:  1.76x

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant