Calculate power spectra using a separate operator power_spectrum#1872
Calculate power spectra using a separate operator power_spectrum#1872Carol Halliwell (cehalliwell) wants to merge 58 commits intomainfrom
Conversation
…ating to power spectra in postage stamps
|
As part of the restructuring, a wrapper is used for the power spectra code within power_spectrum operator. The field for which the power spectrum is to be calculated is split up into a cube for each model and time and power spectrum is then calculated for each before combining into one cube ahead of plotting. This is done to retain the model_name attribute for different models. The generic function to plot a line other than a time series is called _plot_and_save_line_1D. It has been tested for power spectrum (called from plot_line_series with series_coordinate "frequency") but has not be tested for any other series_coordinate. Microsoft Copilot was used to design the wrapper and tidy up _power_spectrum operator. |
|
The tests from the original power spectrum code have been assessed and moved to test_power_spectrum, or within the plot_line_series code in test_plot or deleted as necessary. Extra tests have also been added for options in plot_line_series to check the options when series_coordinate is not time. |
expand on explanatory text.
clarify description of internal function.
changing the code to calculate power_spectral_density instead of power spectrum. This makes it comparable between models with different resolutions and correctly reflects distribution of power per unit frequency
|
I have suggested to calculate power_spectral_density instead of a power spectrum and modified the code in power_spectrum.py in def _DCT_ps(y_3d):. So this does not aligned completely with your reference anymore then. The reason I am suggesting this is: we intend to compare models vs observations including models vs model with differing resolutions. Not at the moment, but the sampling frequency might be different when comparing different CSET runs with each other i.e. differing number of data points (different length of data). A Power Spectrum's magnitude scales with length of the time series (N), sampling frequency (fs) and preprocessing choices i.e. It is not invariant to regridding (low pass filtering) or different grid resolutions. To make them comparable I suggest to plot power per spectral coefficient or power per wavenumber instead of total power per bin which would increase with more datapoints. |
updated name from power_spectrum to power_spectral_density in line with changes moving from power spectrum calculation to power_spectral_density calculation
adding further info to the recipe and website output about how power spectral density enables comparison between models with different resolutions etc.
adding further explanation on power spectral density and how to interpret figures. Pulled into website.
Sylvia Bohnenstengel (Sylviabohnenstengel)
left a comment
There was a problem hiding this comment.
having added a few changes to make results comparable between different resoltuions, different number of data points, different time lengths if we were to compare results from different CSET runs. Happy to approve once James is happy with tests. Have you tested the runs with 2 ensembles?
src/CSET/operators/power_spectrum.py
Outdated
| ) | ||
|
|
||
| # Ensure cube has a realisation coordinate by creating and adding to cube | ||
| realization_coord = iris.coords.AuxCoord(0, standard_name="realization", units="1") |
There was a problem hiding this comment.
For disucssion with James Frost (@jfrost-mo) Should the realisation component you add always be 0 or would this need to be dynamic? I assume that it is ok as the only case we might not have one is when loading a deterministic model and would want it 0 in that case.
src/CSET/operators/power_spectrum.py
Outdated
| # Sum up elements matching k | ||
| mask_k = np.where((alpha_matrix >= alpha) & (alpha_matrix < alpha_p1)) | ||
| # Divide by number of coefficients in bin to get power spectral density insetad of power spectrum | ||
| ps_array[t, k - 1] = np.sum(sigma_2[mask_k]) / len(mask_k[0]) |
There was a problem hiding this comment.
I have suggested to calculate power_spectral_density instead of a power spectrum. So this does not aligned completely with your reference anymore then. The reason I am suggesting this is: 1. we intend to compare models vs observations including models vs model with differing resolutions. Not at the moment, but the sampling frequency might be different when comparing different CSET runs with each other i.e. differing number of data points (different length of data). A Power Spectrum's magnitude scales with length of the time series (N), sampling frequency (fs) and preprocessing choices i.e. It is not invariant to regridding (low pass filtering) or different grid resolutions. To make them comparable I suggest to plot power per spectral coefficient or power per wavenumber instead of total power per bin which would increase with more datapoints.
|
Carol Halliwell (@cehalliwell) A further minor change as we are currently calculating power spectral density over a domain and not a timeseries we are plotting as function of wavenumbe rand not frequency. So in power_spectrum.py in def _power_spectrum it might be clearer to rename freq_coord = iris.coords.DimCoord(freqs, long_name="frequency", units="1") to freq_coord = iris.coords.DimCoord(freqs, long_name="wavenumber", units="1") |
|
Sorry, messed up my github commands - reapplied your changes - need to work out why tests are failing - I may have got rid of something in the merge that we needed to keep, or you may need to double check the new tests (as some have been added) to see if they are still relevant. |
| # Do the actual plotting. | ||
| _plot_and_save_line_series(cubes, coords, "realization", plot_filename, plot_title) | ||
| # Check if this is a spectral plot by looking for spectral coordinates | ||
| is_spectral_plot = series_coordinate in [ | ||
| "frequency", | ||
| "physical_wavenumber", | ||
| "wavelength", | ||
| ] | ||
|
|
||
| # Add list of plots to plot metadata. | ||
| plot_index = _append_to_plot_index([plot_filename]) | ||
| if is_spectral_plot: | ||
| # If series coordinate is frequency, physical_wavenumber or wavelength, for example power spectra with series | ||
| # coordinate frequency/wavenumber. | ||
| # If several power spectra are plotted with time as sequence_coordinate for the | ||
| # time slider option. |
There was a problem hiding this comment.
So I think it is this bit that I may have messed up - not sure though but at least a potential pointer for you.
|
Might also be worth updating your tests and checking your file naming convention in your new plotting operators match the recent PR from Huw (PR #1955) |
|
Unit tests added. A section of code that selects the correct series_coordinate for use in the calculation of grid lengths has been moved to a separate function so that it can be tested more easily. The coverage of tests is now 90%. Some code in plot.py not covered by testing relates to code which is unrelated to this Pull Request. Code now ready for review. Microsoft Copilot used to suggest tests |
Follows on from #1765
The power spectra is currently calculated and plotted in the main plot.py operator. However, this method does not allow aggregation of the power spectra (e.g. mean). The power spectra are calculated in a separate operator rather than in plot.py and the resulting power spectra plotted using the line series method in plot.py. After this, aggregation will be possible using the collapse operator.
Move the existing functions and code for calculating the power spectra into a separate operator, remove the existing power spectra code from plot.py and adjust the line series plotting code to allow for line plotting other than timeseries.
Contribution checklist
Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.