Add post-processing step for a grid and generate grid plot#423
Add post-processing step for a grid and generate grid plot#423EmmaPostolec wants to merge 133 commits intomainfrom
Conversation
… need to work on log scale fro x axis
Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/cbf3c143-d265-4db4-a889-74d0c610eaab Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>
Tests have been added in |
…OTEUS into ep/post_processing_grid
I think it's ready now. I implemented @copilot feedback. I guess the remaining task is to update the |
| # Extract plot_format from cfg | ||
| plot_format = cfg.get('plot_format') | ||
|
|
||
| if 'status' not in df.columns: | ||
| raise ValueError("CSV must contain a 'status' column") |
There was a problem hiding this comment.
plot_format is read with cfg.get('plot_format') and then used directly in the output filename. If the key is missing or misspelled, this will create files ending in .None (or another invalid extension). It would be safer to default to a known format (e.g. png) and/or validate plot_format is one of the supported values before saving.
| # --- ECDF plots --- | ||
| if cfg.get('plot_ecdf', True): | ||
| completed_simulations_data_csv = pd.read_csv(summary_csv_completed, sep='\t') | ||
| columns_output = validate_output_variables( | ||
| completed_simulations_data_csv, cfg['output_variables'] | ||
| ) |
There was a problem hiding this comment.
main() accesses cfg['output_variables'] unconditionally when plot_ecdf is enabled. If a user enables ECDF plotting but forgets to provide output_variables, this will raise a KeyError rather than a helpful message. Consider using cfg.get('output_variables') and raising a clear ValueError describing the required config key.
| elif method == 'arange': | ||
| arr = list(np.arange(value['start'], value['stop'], value['step'])) | ||
| # Ensure endpoint is included | ||
| if not np.isclose(arr[-1], value['stop']): | ||
| arr.append(value['stop']) | ||
| tested_params[key] = np.array(arr, dtype=float) |
There was a problem hiding this comment.
In the arange method, arr[-1] is accessed without guarding against np.arange(...) returning an empty array (e.g. start == stop or an incompatible step). This can raise IndexError and abort post-processing. Add an empty-check before reading arr[-1], and decide how to handle the degenerate case (e.g. treat it as a single-value grid).
| def extract_solidification_time(cases_data: list, grid_dir: str | Path): | ||
| """ | ||
| Extract solidification time for each simulation of the grid for | ||
| the condition Phi_global < phi_crit at last time step. | ||
|
|
There was a problem hiding this comment.
The extract_solidification_time() docstring says the condition Phi_global < phi_crit is checked “at last time step”, but the implementation finds the first timestep where the condition becomes true (idxmax() on the boolean condition). Please update the docstring to match the actual behavior (solidification time = first crossing).
| def flatten_input_parameters(d: dict, parent_key: str = '') -> dict: | ||
| """ | ||
| Flattens a nested input-parameter dictionary from a TOML configuration | ||
| into a flat mapping of dot-separated parameter paths to their plotting | ||
| configuration. | ||
|
|
There was a problem hiding this comment.
flatten_input_parameters() is defined but not called anywhere in this module (or elsewhere in the repo). Keeping unused helpers in a new ~1000-line module makes the code harder to maintain; consider removing it or wiring it into the ECDF settings loading if it’s intended for future config-driven labels/scales.
| - `{grid_name}_final_extracted_data_completed.csv` which contains only successful runs (used for ECDF plots) | ||
| - `{grid_name}_final_extracted_data_running_error.csv` for only failed simulations with status `Running` or `Error`. | ||
|
|
||
| - `grid_plots/` This directory contains a status summary plot and a ECDF plot |
There was a problem hiding this comment.
Grammar: “a ECDF plot” should be “an ECDF plot”.
| reservoir = "outgas" # Escaping reservoir: "bulk", "outgas", "pxuv". | ||
|
|
||
| [escape.zephyrus] | ||
| Pxuv = 1e-2 # Pressure at which XUV radiation become opaque in the planetary atmosphere [bar] |
There was a problem hiding this comment.
Grammar in the inline comment: “XUV radiation become opaque” → “XUV radiation becomes opaque”.
Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/43d242cd-ebd2-458b-9dde-201556591aef Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>
Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/43d242cd-ebd2-458b-9dde-201556591aef Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>
The CI checks are passing on the latest commits. I've now added 26 Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
Test the imports workfrom proteus.grid.post_processing import (
|
nichollsh
left a comment
There was a problem hiding this comment.
Thanks for this, @EmmaPostolec, it's great to have some utilities for easily analysing the results of PROTEUS grids.
However, I have some concerns about some of the code that need clarifying and/or actioning. I am worried that a lot of the code is sprawling and some of it is redundant. This makes it hard to review, but also will make it harder to maintain in the future (tech debt) and could lead to some inconsistencies in how grids are analysed.
These comments are quite minor overall, and should not be too difficult to address. Thanks again for adding these! Looking forward to seeing it used frequently :)
There was a problem hiding this comment.
Please do not use print() statements. PROTEUS uses logging throughout its codebase, so all print statements in this file should be replaced with log.info or log.warning as appropriate.
| if method == 'direct': | ||
| tested_params[key] = value['values'] | ||
|
|
||
| elif method == 'linspace': | ||
| tested_params[key] = np.linspace(value['start'], value['stop'], value['count']) | ||
|
|
||
| elif method == 'logspace': | ||
| tested_params[key] = np.logspace( | ||
| np.log10(value['start']), np.log10(value['stop']), value['count'] | ||
| ) | ||
|
|
||
| elif method == 'arange': | ||
| arr = list(np.arange(value['start'], value['stop'], value['step'])) | ||
| # Ensure endpoint is included | ||
| if not np.isclose(arr[-1], value['stop']): | ||
| arr.append(value['stop']) | ||
| tested_params[key] = np.array(arr, dtype=float) |
There was a problem hiding this comment.
This code is a reflection of how the grid axes are constructed in manage.py. However, if the code in manage.py is updated without also updating this code, the analysis will become inconsistent with the grid output. See here:
PROTEUS/src/proteus/grid/manage.py
Lines 204 to 227 in 6e6ead5
Perhaps you could generalise the three functions in manage.py by moving them outside the Grid object, which would allow them to be called by both Grid.__init__() and get_tested_grid_parameters, without duplicating any code.
|
|
||
| else: | ||
| print(f'⚠️ Unknown method for {key}: {method}') | ||
| continue |
There was a problem hiding this comment.
Should probably error here, rather than continuing.
| def extract_solidification_time(cases_data: list, grid_dir: str | Path): | ||
| """ | ||
| Extract solidification time for each simulation of the grid for | ||
| the condition Phi_global < phi_crit at last time step. | ||
|
|
|
|
||
| else: | ||
| if not columns_printed: | ||
| print('Warning: Missing Phi_global or Time column.') |
| return grouped | ||
|
|
||
|
|
||
| def latex(label: str) -> str: |
There was a problem hiding this comment.
This function is a duplicate of latexify here:
PROTEUS/src/proteus/utils/plot.py
Line 332 in 6e6ead5
| import matplotlib.cm as cm | ||
|
|
||
| cfg = { | ||
| 'colormap': 'plasma', |
| def load_ecdf_plot_settings(cfg, tested_params=None): | ||
| """ | ||
| Load ECDF plotting settings for both input parameters and output variables | ||
| from a configuration dictionary loaded from TOML. | ||
|
|
||
| Parameters |
There was a problem hiding this comment.
This function is huge and only used once. Is it actually needed? Seems like a lot of redundant code. Currently, the settings are loaded by the pathway:
_preset dict -> get_ function -> this function -> settings dict
This is quite cumbersome and therefore prone to more errors. Instead, perhaps the ECDF plotting function could call the get_label (etc.) functions directly, and then this whole load_ecdf_plot_settings function and the output_settings and param_settings would not be needed at all.
|
|
||
| # List of parameter names (rows) and output names (columns) | ||
| param_names = list(param_settings.keys()) | ||
| out_names = list(output_settings.keys()) |
There was a problem hiding this comment.
See comment above about these dictionaries being redundant.
| # --------------------------------------------------------- | ||
|
|
||
|
|
||
| def main(grid_analyse_toml_file: str | Path): |
There was a problem hiding this comment.
Please do not call this function main. This implies that it is the main function of PROTEUS, which is confusing and hard to maintain.
Description
Adds
proteus grid-analyseCLI command to post-process PROTEUS simulation grids: extract final-timestep outputs to a summary CSV and generate status summary + ECDF grid plots. Driven entirely by existing.grid.tomlconfig — no separate analysis config file needed.Key changes:
src/proteus/grid/post_processing.py: CSV extraction and ECDF/status plotting. Colormap, output variables, plot format all read from top-level grid config keys.src/proteus/utils/plot.py:_preset_labels,_preset_scales,_preset_log_scalesdicts for automatic axis labelling, unit scaling, and log-scale detection by quantity name.src/proteus/cli.py:grid-analysecommand using@config_option(consistent with other commands).input/ensembles/example.grid.toml: Extended with post-processing keys (update_csv,plot_status,plot_ecdf,colormap,output_variables,plot_format).tests/grid/test_post_processing.py: 26@pytest.mark.unittests for pure helper functions (get_label,clean_series,validate_output_variables,flatten_input_parameters,load_ecdf_plot_settings,group_output_by_parameter, etc.).tests/grid/test_grid.py:test_grid_post_processintegration test runs post-processing on the dummy grid and asserts CSV + plots are created.load_ecdf_plot_settingswas readingcolormapfromcfg['input_parameters']['colormap']instead of top-levelcfg['colormap'], causing user-provided colormaps to be silently ignored.Usage:
Output layout:
Validation of changes
Tested on multiple grids on Habrok HPC cluster. Unit tests cover all pure helper functions; integration test (
test_grid_post_process) runs the full pipeline on the dummy 2-case grid.Checklist
Relevant people
@nichollsh @IKisvardai @Emeline0110 @planetmariana @timlichtenberg