Draft
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rences Remove 11 tests that merely exercised numpy/scipy library behavior (exponential decay, monotonicity, shape checks). Keep 3 tests for non-obvious correctness: Matérn κ=0.5 mathematical identity and picklability of resolved correlation callables. Fix test_compute_at_locations to reference modified_foster_2019 resource files matching the benchmark CSV. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The modified_foster_2019 and foster_2019 categorical files were identical. Point both configs (and test fixtures) to the foster_2019 versions and delete the duplicate modified_foster_2019 categorical CSVs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file was already referenced by the modified_foster_2019 config but was untracked. Contains 459 independent observations including 47 Q3 broadband stations retained by the modified model. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Systematic experiment matrix (200-point comparison against Wellington test subgrid) determined the actual parameters Jaehwi used for V1.0_26Mar.tif: - Bayesian update from priors (NOT Foster 2019 posteriors despite being the code default — Foster posteriors give 199 m/s error) - Ratio 1:1 combination (50/50 geometric mean, not stdv weighting) - No coastal distance modification (code was commented out) - GID 4 slope interpolation skipped (mod6=True default) - 671 reconstructed observations (McGann 276 + Wotherspoon 36 + Kaiser 359) Add skip_alluvium_slope parameter to decouple GID 4 slope skip from coastal distance flag (Jaehwi had these independent: mod6=True while coastal distance was disabled separately). Thread through pipeline, parallel, and CLI config loading. Update jaehwi_v1p0.yaml: combination_method ratio, combine_ratio 1.0, apply_coastal_distance_mod false, skip_alluvium_slope true. Current match: 76.5% of points within 1 m/s, 18.03 m/s mean abs diff (residual from MVN spatial adjustment sensitivity in high-observation categories). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ion comparison Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
compute_grid → grid_pipeline compute_at_locations → points_pipeline run_in_memory_pipeline_for_model_type → compute_model_grid Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…26Mar.tif either Even with the same codebase and nproc=1, 15% of pixels (MVN-affected near observations) differ by ~2.8% due to float32/caching artifacts that depend on processing order. Confirms our pipeline reproduces the intended model correctly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…duction artifacts demonstrate_grid_artifacts.py: runnable tests showing float32 precision loss, distance caching, and combined effects on MVN spatial adjustment. why_we_cannot_reproduce_v1p0_grid_exactly.md: non-technical explanation for colleagues unfamiliar with the codebase. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ran both pipelines in points mode on 200 random NZ locations. Geology and terrain IDs match 100%. Median Vs30 diff is 0.0002 m/s. The 14.5% of points with >1% diff are near observations where the reconstructed 671-station set diverges slightly from Jaehwi's internal loader. Also discovered and documented a terrain raster pixel boundary artifact: V1.0_26Mar.tif pixel centres coincide with IwahashiPike.tif pixel boundaries (50 m grid offset), so the NZTM-WGS84-NZTM coordinate roundtrip (~5 um shift) can flip terrain IDs at those exact locations. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solution The identical IwahashiPike.tif raster (MD5 match) produced 35% terrain ID disagreement due to a 50 m grid offset between V1.0_26Mar.tif and the terrain raster. Sample points at reference TIF pixel centres fall exactly on terrain raster pixel boundaries, where a 3-5 um NZTM-WGS84-NZTM coordinate roundtrip shift flips the floor() pixel assignment. Shifting by 50 m resolves it completely (200/200 terrain IDs match). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The previous 608-station file was a filtered reconstruction. This replaces it with the exact 671-station output of sites_load_NSHM2022.load_vs() from /home/arr65/src/Vs30_2026/vs30/, ensuring our pipeline uses identical observation data to Jaehwi's code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The mod6 flag conflated two independent modifications: alluvium slope interpolation and coastal distance modification. When skip_alluvium_slope was True and apply_coastal_distance_mod was False, mod6 evaluated to True, causing coastal distance modification to run with dummy zero distances and clamping all GID 4 (alluvium) pixels to vs30_min=240.0. Fix: replace mod6/mod13 booleans with independent apply_alluvium_slope_mod and apply_coastal_distance_mod controls. Rename skip_alluvium_slope to apply_alluvium_slope_mod (positive logic, consistent _mod suffix). Remove defaults so configs must explicitly specify both parameters. 200-point comparison improved from 82% to 99.5% within 0.01% of Jaehwi's output (mean diff 8.9 → 0.032 m/s). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shorten jaehwi_v1p0_reproduction_investigation.md to conclusions, confirmed config, and 200-point comparison results. Condense legacy_python_code_bug_analysis.md to a summary table and brief descriptions of each bug (useful context for developers comparing against Viktor's Python port). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…test config loading
- Consolidate mp.get_context("spawn") into parallel.py as single source
- Extract _default_correlation_functions() in pipeline.py
- Extract _collect_observation_csvs() in pipeline.py
- Move shared test config loading to conftest.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR is a major refactor of the Vs30 package to improve readability, modularity, and testability. The mathematical expressions are equivalent to the original version, but use clearer variable names and simplified forms. Regression tests confirm that the output from the refactored codebase is consistent with the original version.
Design changes
Centralized configuration
config.yamlfile, rather than being hardcoded across various source files as in the original codebase.Modular CLI
vs30) exposes the full pipeline as well as individual stages (update-categorical-vs30-models,make-initial-vs30-raster,spatial-fit, etc.), making it possible to run or re-run specific steps independently.compute-at-locationscommand computes Vs30 at specific latitude/longitude points without generating full raster grids, which is efficient for querying a small number of sites.Input Vs30 data
"original"combined three hardcoded sources with dataset-specific filtering and downsampling;"cpt"loaded CPT data), with different processing and Bayesian update paths for each mode.Vs30 map generation
max_pointsparameter caps the number of nearest observations used for the per-pixel MVN update, limiting the cost of the matrix inversion at each pixel.Multiprocessing
compute-at-locationssupport multiprocessing. The spatial adjustment work is divided into chunks of affected pixels distributed across worker processes.threadpoolctl, rather than requiring environment variables to be set before import.Test suite