Add post-processing step for a grid and generate grid plot by EmmaPostolec · Pull Request #423 · FormingWorlds/PROTEUS

EmmaPostolec · 2025-06-05T15:44:57Z

Description

Adds proteus grid-analyse CLI command to post-process PROTEUS simulation grids: extract final-timestep outputs to a summary CSV and generate status summary + ECDF grid plots. Driven entirely by existing .grid.toml config — no separate analysis config file needed.

Key changes:

src/proteus/grid/post_processing.py: CSV extraction and ECDF/status plotting. Colormap, output variables, plot format all read from top-level grid config keys.
src/proteus/utils/plot.py: _preset_labels, _preset_scales, _preset_log_scales dicts for automatic axis labelling, unit scaling, and log-scale detection by quantity name.
src/proteus/cli.py: grid-analyse command using @config_option (consistent with other commands).
input/ensembles/example.grid.toml: Extended with post-processing keys (update_csv, plot_status, plot_ecdf, colormap, output_variables, plot_format).
tests/grid/test_post_processing.py: 26 @pytest.mark.unit tests for pure helper functions (get_label, clean_series, validate_output_variables, flatten_input_parameters, load_ecdf_plot_settings, group_output_by_parameter, etc.).
tests/grid/test_grid.py: test_grid_post_process integration test runs post-processing on the dummy grid and asserts CSV + plots are created.
Bug fix: load_ecdf_plot_settings was reading colormap from cfg['input_parameters']['colormap'] instead of top-level cfg['colormap'], causing user-provided colormaps to be silently ignored.

Usage:

proteus grid-analyse --config input/ensembles/example.grid.toml

Output layout:

your_grid/
 └─ post_processing/
     ├─ extracted_data/
     │   └─ your_grid_final_extracted_data_{all,completed,running_error}.csv
     └─ grid_plots/
         ├─ summary_grid_statuses_your_grid.{png,pdf}
         ├─ ecdf_grid_plot_your_grid.{png,pdf}
         └─ single_plots_ecdf/

Validation of changes

Tested on multiple grids on Habrok HPC cluster. Unit tests cover all pure helper functions; integration test (test_grid_post_process) runs the full pipeline on the dummy 2-case grid.

Checklist

I have followed the contributing guidelines
My code follows the style guidelines of this project
I have performed a self-review of my code
My changes generate no new warnings or errors
I have checked that the tests still pass on my computer
I have updated the docs, as appropriate
I have added tests for these changes, as appropriate
I have checked that all dependencies have been updated, as required

Relevant people

@nichollsh @IKisvardai @Emeline0110 @planetmariana @timlichtenberg

…t it

… need to work on log scale fro x axis

…orking

Copilot

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 8 comments.

src/proteus/grid/post_processing.py

src/proteus/utils/plot.py

src/proteus/config/_escape.py

docs/How-to/usage.md

input/planets/toi561b.toml

Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/cbf3c143-d265-4db4-a889-74d0c610eaab Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>

Copilot · 2026-04-08T18:08:40Z

This is a useful tool that will probably see a lot of use in the future, since running grids of models is such a common approach.

I have tried it with a dummy grid, and it works generally as descr...

Tests have been added in tests/grid/test_grid.py as test_grid_post_process — it calls gpostprocess(GRID_CONFIG) on the dummy grid and asserts that the summary CSV, status summary plot, and ECDF plot are all generated correctly.

…OTEUS into ep/post_processing_grid

EmmaPostolec · 2026-04-08T19:09:16Z

Thanks, @EmmaPostolec. Please let me know when you'd like this to be reviewed.

I think it's ready now. I implemented @copilot feedback. I guess the remaining task is to update the pytest to increase the coverage and to pass the cheks for merging ?

Copilot

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.

Copilot · 2026-04-08T19:12:17Z

src/proteus/grid/post_processing.py

+    # Extract plot_format from cfg
+    plot_format = cfg.get('plot_format')
+
+    if 'status' not in df.columns:
+        raise ValueError("CSV must contain a 'status' column")


plot_format is read with cfg.get('plot_format') and then used directly in the output filename. If the key is missing or misspelled, this will create files ending in .None (or another invalid extension). It would be safer to default to a known format (e.g. png) and/or validate plot_format is one of the supported values before saving.

Copilot · 2026-04-08T19:12:18Z

src/proteus/grid/post_processing.py

+    # --- ECDF plots ---
+    if cfg.get('plot_ecdf', True):
+        completed_simulations_data_csv = pd.read_csv(summary_csv_completed, sep='\t')
+        columns_output = validate_output_variables(
+            completed_simulations_data_csv, cfg['output_variables']
+        )


main() accesses cfg['output_variables'] unconditionally when plot_ecdf is enabled. If a user enables ECDF plotting but forgets to provide output_variables, this will raise a KeyError rather than a helpful message. Consider using cfg.get('output_variables') and raising a clear ValueError describing the required config key.

Copilot · 2026-04-08T19:12:18Z

src/proteus/grid/post_processing.py

+            elif method == 'arange':
+                arr = list(np.arange(value['start'], value['stop'], value['step']))
+                # Ensure endpoint is included
+                if not np.isclose(arr[-1], value['stop']):
+                    arr.append(value['stop'])
+                tested_params[key] = np.array(arr, dtype=float)


In the arange method, arr[-1] is accessed without guarding against np.arange(...) returning an empty array (e.g. start == stop or an incompatible step). This can raise IndexError and abort post-processing. Add an empty-check before reading arr[-1], and decide how to handle the degenerate case (e.g. treat it as a single-value grid).

Copilot · 2026-04-08T19:12:18Z

src/proteus/grid/post_processing.py

+def extract_solidification_time(cases_data: list, grid_dir: str | Path):
+    """
+    Extract solidification time for each simulation of the grid for
+    the condition Phi_global < phi_crit at last time step.
+


The extract_solidification_time() docstring says the condition Phi_global < phi_crit is checked “at last time step”, but the implementation finds the first timestep where the condition becomes true (idxmax() on the boolean condition). Please update the docstring to match the actual behavior (solidification time = first crossing).

@EmmaPostolec please can you check this?

Copilot · 2026-04-08T19:12:19Z

src/proteus/grid/post_processing.py

+def flatten_input_parameters(d: dict, parent_key: str = '') -> dict:
+    """
+    Flattens a nested input-parameter dictionary from a TOML configuration
+    into a flat mapping of dot-separated parameter paths to their plotting
+    configuration.
+


flatten_input_parameters() is defined but not called anywhere in this module (or elsewhere in the repo). Keeping unused helpers in a new ~1000-line module makes the code harder to maintain; consider removing it or wiring it into the ECDF settings loading if it’s intended for future config-driven labels/scales.

src/proteus/grid/post_processing.py

Copilot · 2026-04-08T19:12:19Z

docs/How-to/usage.md

+    - `{grid_name}_final_extracted_data_completed.csv` which contains only successful runs (used for ECDF plots)
+    - `{grid_name}_final_extracted_data_running_error.csv` for only failed simulations with status `Running` or `Error`.
+
+- `grid_plots/` This directory contains a status summary plot and a ECDF plot


Grammar: “a ECDF plot” should be “an ECDF plot”.

Copilot · 2026-04-08T19:12:19Z

input/planets/toi561b.toml

+    reservoir = "outgas"      # Escaping reservoir: "bulk", "outgas", "pxuv".
+
+    [escape.zephyrus]
+        Pxuv        = 1e-2          # Pressure at which XUV radiation become opaque in the planetary atmosphere [bar]


Grammar in the inline comment: “XUV radiation become opaque” → “XUV radiation becomes opaque”.

Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/43d242cd-ebd2-458b-9dde-201556591aef Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>

Copilot · 2026-04-08T19:25:00Z

Thanks, @EmmaPostolec. Please let me know when you'd like this to be reviewed.

I think it's ready now. I implemented @copilot feedback. I guess the remaining task is to update the pytest to inc...

The CI checks are passing on the latest commits. I've now added 26 @pytest.mark.unit tests in a new file tests/grid/test_post_processing.py (commit e65c564) that cover the pure helper functions in post_processing.py: get_grid_name, get_label, get_scale, get_log_scale, latex, clean_series, validate_output_variables, flatten_input_parameters, load_ecdf_plot_settings, and group_output_by_parameter. These are fast (<100ms each), require no running grid or compiled binaries, and run as part of the fast CI gate.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

julialang-s3.julialang.org
- Triggering command: /usr/bin/python python -m pytest tests/grid/test_post_processing.py -v -m unit (dns block)
- Triggering command: `/usr/bin/python python -c
  import sys
  sys.path.insert(0, 'src')

Test the imports work

from proteus.grid.post_processing import (
clean_series, flatten_input_parameters, get_grid_name,
get_label, get_log_scale, get_scale, group_output_by_parameter,
latex, load_ecdf_` (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled

Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

nichollsh

Thanks for this, @EmmaPostolec, it's great to have some utilities for easily analysing the results of PROTEUS grids.

However, I have some concerns about some of the code that need clarifying and/or actioning. I am worried that a lot of the code is sprawling and some of it is redundant. This makes it hard to review, but also will make it harder to maintain in the future (tech debt) and could lead to some inconsistencies in how grids are analysed.

These comments are quite minor overall, and should not be too difficult to address. Thanks again for adding these! Looking forward to seeing it used frequently :)

nichollsh · 2026-04-09T07:15:29Z

src/proteus/grid/post_processing.py

Please do not use print() statements. PROTEUS uses logging throughout its codebase, so all print statements in this file should be replaced with log.info or log.warning as appropriate.

nichollsh · 2026-04-09T07:19:08Z

src/proteus/grid/post_processing.py

+            if method == 'direct':
+                tested_params[key] = value['values']
+
+            elif method == 'linspace':
+                tested_params[key] = np.linspace(value['start'], value['stop'], value['count'])
+
+            elif method == 'logspace':
+                tested_params[key] = np.logspace(
+                    np.log10(value['start']), np.log10(value['stop']), value['count']
+                )
+
+            elif method == 'arange':
+                arr = list(np.arange(value['start'], value['stop'], value['step']))
+                # Ensure endpoint is included
+                if not np.isclose(arr[-1], value['stop']):
+                    arr.append(value['stop'])
+                tested_params[key] = np.array(arr, dtype=float)


This code is a reflection of how the grid axes are constructed in manage.py. However, if the code in manage.py is updated without also updating this code, the analysis will become inconsistent with the grid output. See here:

PROTEUS/src/proteus/grid/manage.py

Lines 204 to 227 in 6e6ead5

# Set a dimension by linspace

def set_dimension_linspace(self, name: str, start: float, stop: float, count: int):

self.dim_avars[name] = list(np.linspace(start, stop, count))

self.dim_avars[name] = [float(v) for v in self.dim_avars[name]]

# Set a dimension by arange (inclusive of endpoint)

def set_dimension_arange(self, name: str, start: float, stop: float, step: float):

self.dim_avars[name] = list(np.arange(start, stop, step))

if not np.isclose(self.dim_avars[name][-1], stop):

self.dim_avars[name].append(stop)

self.dim_avars[name] = [float(v) for v in self.dim_avars[name]]

# Set a dimension by logspace

def set_dimension_logspace(self, name: str, start: float, stop: float, count: int):

self.dim_avars[name] = list(np.logspace(np.log10(start), np.log10(stop), count))

self.dim_avars[name] = [float(v) for v in self.dim_avars[name]]

# Set a dimension directly

def set_dimension_direct(self, name: str, values: list, sort: bool = True):

the_list = list(set(values)) # remove duplicates

if (not isinstance(values[0], str)) and sort: # sort if numeric

the_list = sorted(the_list)

self.dim_avars[name] = the_list

Perhaps you could generalise the three functions in manage.py by moving them outside the Grid object, which would allow them to be called by both Grid.__init__() and get_tested_grid_parameters, without duplicating any code.

nichollsh · 2026-04-09T07:20:02Z

src/proteus/grid/post_processing.py

+
+            else:
+                print(f'⚠️ Unknown method for {key}: {method}')
+                continue


Should probably error here, rather than continuing.

nichollsh · 2026-04-09T07:22:11Z

src/proteus/grid/post_processing.py

+def extract_solidification_time(cases_data: list, grid_dir: str | Path):
+    """
+    Extract solidification time for each simulation of the grid for
+    the condition Phi_global < phi_crit at last time step.
+


@EmmaPostolec please can you check this?

nichollsh · 2026-04-09T07:22:37Z

src/proteus/grid/post_processing.py

+
+        else:
+            if not columns_printed:
+                print('Warning: Missing Phi_global or Time column.')


Should be log.warning

nichollsh · 2026-04-09T07:34:34Z

src/proteus/grid/post_processing.py

+    return grouped
+
+
+def latex(label: str) -> str:


This function is a duplicate of latexify here:

PROTEUS/src/proteus/utils/plot.py

Line 332 in 6e6ead5

def latexify(gas: str):

nichollsh · 2026-04-09T07:37:02Z

tests/grid/test_post_processing.py

+    import matplotlib.cm as cm
+
+    cfg = {
+        'colormap': 'plasma',


Use scientific colour maps

nichollsh · 2026-04-09T07:41:07Z

src/proteus/grid/post_processing.py

+def load_ecdf_plot_settings(cfg, tested_params=None):
+    """
+    Load ECDF plotting settings for both input parameters and output variables
+    from a configuration dictionary loaded from TOML.
+
+    Parameters


This function is huge and only used once. Is it actually needed? Seems like a lot of redundant code. Currently, the settings are loaded by the pathway:

_preset dict -> get_ function -> this function -> settings dict

This is quite cumbersome and therefore prone to more errors. Instead, perhaps the ECDF plotting function could call the get_label (etc.) functions directly, and then this whole load_ecdf_plot_settings function and the output_settings and param_settings would not be needed at all.

nichollsh · 2026-04-09T07:41:42Z

src/proteus/grid/post_processing.py

+
+    # List of parameter names (rows) and output names (columns)
+    param_names = list(param_settings.keys())
+    out_names = list(output_settings.keys())


See comment above about these dictionaries being redundant.

nichollsh · 2026-04-09T07:42:37Z

src/proteus/grid/post_processing.py

+# ---------------------------------------------------------
+
+
+def main(grid_analyse_toml_file: str | Path):


Please do not call this function main. This implies that it is the main function of PROTEUS, which is confusing and hard to maintain.

E.N. Postolec and others added 30 commits April 8, 2025 14:08

Change in escape/wrapper.py to put a warning if M_vols too low

093a257

Merge branch 'main' into ep/bug_escape

1da9cb2

Minor doc update

13d9272

Add toi561b toml file in input/planets

ff6bae5

toi561b toml file

3ce1066

update

7112343

Add the post_processing_grid.py script and create a new folder to tes…

f426496

…t it

Add histo plot

22e7fd9

Move post_processing_grid folder to tools

cc4cac9

try to do plot function : fail

e6c59ed

Try to save extracted data in a csv file : still working on it

8607942

Save into csv ok now need to clean script

c944fcb

post processing grid : generate csv file and then single plots. still…

c453f46

… need to work on log scale fro x axis

try for grid plot. need to fix log single plots

2bacac7

Merge branch 'main' into ep/post_processing_grid

06a68e8

Clean and clear comment in post_processing_grid.py

36858ca

clean plot_grid.py

f3dc067

Minor update to make it work fro another grid

62057ea

Merge branch 'main' into ep/post_processing_grid

abec185

Update toi561b.toml with johanna updated value in green in the table

bc38fe1

maybe i fixed the log scale problem youhou

493e68a

Almost there with log plots still needs to fix pb

63b3022

Merge branch 'main' into ep/post_processing_grid

d768f6b

Merge branch 'main' into ep/post_processing_grid

a04154d

test vs code commit

f35b076

minor changes

ed4b02e

Merge branch 'main' into ep/post_processing_grid

a361a7a

Merge branch 'main' into ep/post_processing_grid

0d44975

small test to have consistent plot

c000b1a

try to fix error log : now using ecdfplot for a single plot and its w…

2abe169

…orking

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Copilot started work on behalf of EmmaPostolec April 8, 2026 18:03 View session

EmmaPostolec and others added 2 commits April 8, 2026 20:06

fix colormap problem

04c654a

fix: read colormap from top-level cfg key and fix ruff lint in platon.py

1c51c2a

Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/cbf3c143-d265-4db4-a889-74d0c610eaab Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>

Copilot AI changed the title ~~Add post processing step for a grid and generate grid plot~~ Add post-processing step for a grid and generate grid plot Apr 8, 2026

Copilot finished work on behalf of EmmaPostolec April 8, 2026 18:10

EmmaPostolec added 10 commits April 8, 2026 20:11

fix removing 0 when it shouldnt with clean_series

a398c59

fix mistake unit rho_obs in plot.py

10b92cd

update the grammar in _escape.py

4d8fc82

update doc mistake

c635379

doc typo

77356bf

comment for copilot on no argument in main

ace2dac

to avoid copilot warning about not passing argument to main

3d84a2b

Merge branch 'ep/post_processing_grid' of github.com:FormingWorlds/PR…

5b6ce81

…OTEUS into ep/post_processing_grid

typo _escape.py

5738d61

ruff format on src/tests/ to pas test

6e6ead5

EmmaPostolec requested a review from Copilot April 8, 2026 19:06

Copilot started reviewing on behalf of EmmaPostolec April 8, 2026 19:06 View session

EmmaPostolec marked this pull request as ready for review April 8, 2026 19:07

EmmaPostolec requested a review from a team as a code owner April 8, 2026 19:07

Copilot started work on behalf of EmmaPostolec April 8, 2026 19:09 View session

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Copilot AI and others added 2 commits April 8, 2026 19:22

test: add unit tests for post_processing helper functions

1d8978a

Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/43d242cd-ebd2-458b-9dde-201556591aef Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>

test: address code review feedback on unit tests

e65c564

Agent-Logs-Url: https://github.com/FormingWorlds/PROTEUS/sessions/43d242cd-ebd2-458b-9dde-201556591aef Co-authored-by: EmmaPostolec <122358811+EmmaPostolec@users.noreply.github.com>

Copilot finished work on behalf of EmmaPostolec April 8, 2026 19:26

nichollsh self-requested a review April 9, 2026 07:09

nichollsh requested changes Apr 9, 2026

View reviewed changes

	# Set a dimension by linspace
	def set_dimension_linspace(self, name: str, start: float, stop: float, count: int):
	self.dim_avars[name] = list(np.linspace(start, stop, count))
	self.dim_avars[name] = [float(v) for v in self.dim_avars[name]]

	# Set a dimension by arange (inclusive of endpoint)
	def set_dimension_arange(self, name: str, start: float, stop: float, step: float):
	self.dim_avars[name] = list(np.arange(start, stop, step))
	if not np.isclose(self.dim_avars[name][-1], stop):
	self.dim_avars[name].append(stop)
	self.dim_avars[name] = [float(v) for v in self.dim_avars[name]]

	# Set a dimension by logspace
	def set_dimension_logspace(self, name: str, start: float, stop: float, count: int):
	self.dim_avars[name] = list(np.logspace(np.log10(start), np.log10(stop), count))
	self.dim_avars[name] = [float(v) for v in self.dim_avars[name]]

	# Set a dimension directly
	def set_dimension_direct(self, name: str, values: list, sort: bool = True):
	the_list = list(set(values)) # remove duplicates
	if (not isinstance(values[0], str)) and sort: # sort if numeric
	the_list = sorted(the_list)
	self.dim_avars[name] = the_list

		# ---------------------------------------------------------


		def main(grid_analyse_toml_file: str \| Path):

Conversation

EmmaPostolec commented Jun 5, 2025 • edited by Copilot AI Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Validation of changes

Checklist

Relevant people

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Apr 8, 2026

Uh oh!

EmmaPostolec commented Apr 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

I tried to connect to the following addresses, but was blocked by firewall rules:

Test the imports work

Uh oh!

nichollsh left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EmmaPostolec commented Jun 5, 2025 •

edited by Copilot AI

Loading

Copilot AI commented Apr 8, 2026 •

edited

Loading