Can be summarized in 4 main stages :
- Simulation : generate simulation data with Dedalus,
01_simu.pyscript. - Sampling : create a training dataset by sampling Dedalus simulation data,
02_sample.pyscript. - Training : train a FNO model on a given dataset,
03_train.pyscript - Evaluation : evaluate the trained FNO with some metrics,
04_eval.pyscript - SDC Run : run SDC with FNO initialization,
05_runSDC.pyscript
Each script can be run separately with command-line arguments, with all arguments having default value (use -h for more details), ex :
$ ./01_simu.py -hbut also using a config.yaml file storing all scripts argument, that will overwrite any argument
passed in command line, if provided in the config file, ex :
$ ./01_simu.py --config pathToConfig.yaml
⚠️ Per default, all pipeline scripts don't use a config file if not provided, except for03_train.pythat requires one, and if not provided will use a defaultconfig.yamlprovided in the current directory (see the baseconfig.yamlfor the pipeline default configuration).
In fact, you can run the all pipeline using default value (from the scripts and the base [config.yaml] file), in order to see if everything works fine ...
$ ./01_simu.py # this may take a very long time, better to run with MPI
$ ./02_sample.py # run once, can be used for many training ...
$ ./03_train.py # faster when run on GPU
$ ./04_eval.py # can be run separately from training, without config fileThere is also some companion scripts that can be used in parallel to the pipeline scripts (they don't require any config file):
- 10_viewDataset.py : print infos from a dataset, and can plot some contours of its inputs / outputs
- 11_modelOutput.py : plot solution (or update) contours of a model on a given sample of a dataset
- 12_inspectModel.py : print model configuration and status from a checkpoint file
- 13_plotLoss.py : plot loss evolution from a file storing it
📜 Examples of slurm scripts using those scripts are provided in the
slurmfolder ...
Use the 01_simu.py script to run nSimu simulation, that will start accumulate data after tInit seconds,
and write a solution field every dtData seconds, until tEnd seconds are done.
All simulation file will be stored in a dataDir folder.
⚠️ tEnddoesn't take into accounttInit, so the total simulation time (initial run + data accumulation) istInit+tEnd.
This script can be run in parallel using MPI, and can use a the arguments provided in a simu section of a config file.
Use the 02_sample.py script to create a dataset stored in dataFile, from simulation data stored in dataDir.
It uses three main parameters for sampling :
inSize: number of time-step contained in one input (onlyinSize=1implemented for now)outStep: number ofdtDatabetween an input and its output (i.e the time-step size of the update)inStep: the number ofdtDatato jump before taking the next input (and associated output ...)
In addition, there is two additional parameters that can be used
outType:- if
"solution", then each output is built by simply taking the time-stepper solution - if
"update", then each output is built by taking the time-stepper update, multiply by a given scaling factor
- if
outScaling: ifoutType="update", the scaling factor used to build the output
If a config file is given, then it will use any parameters provided in a simu, sample and data section,
see the base config.yaml for reference ...
Use the 03_train.py script to train a model using a training dataset stored in dataFile (section data in the config file).
Most of the training settings have to be specified in a config file, see the base config.yaml for reference ...
It will run on GPU is one is available, else on CPU. Specifying seed: null will simply separate the data using the trainRatio
without shuffling the data. For instance, if 10 simulations were run to generate data, trainRatio=0.8 takes the data of 8 simulations
for training, and the data of the 2 last simulations for validation.
Use the 04_eval.py script to evaluate a given model stored in a checkpoint file,
on one simulation of a dataset stored in dataFile.
📣 This script don't necessarily require a config file : the whole model can be instantiated with the
checkpointfile, since all model settings are stored in there, and are sufficient for inference only.
Evaluation metrics are (for now) :
- averaged spectrum for
$u_x$ and$u_z$