Skip to content

Gridding#233

Open
lauryntalbot wants to merge 6 commits intoc-proof:mainfrom
lauryntalbot:gridding
Open

Gridding#233
lauryntalbot wants to merge 6 commits intoc-proof:mainfrom
lauryntalbot:gridding

Conversation

@lauryntalbot
Copy link

  1. make_gridfiles in ncprocess: QC grids using nanmax rather than mean
  2. gappy_vertical_fill in utils: doesn't fill gaps >50m

Updated gappy_fill_vertical to fill small NaN runs based on max_gap parameter.
Applied column-wise.

data = gappy_fill_vertical(data)
Fill ONLY small NaN runs (<= max_gap bins) inside each vertical column.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure xarray has a gappy interp. Should we wswitch to that rather than this bespoke version?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try that :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah , interpolate_na has a max_gap or "limit" parameters. Or does this do something different?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 replace: dsout[k].values = utils.gappy_fill_vertical(dsout[k].values)
 
 with : 
    dsout[k] = dsout[k].interpolate_na( dim="depth",method="linear",max_gap=50)
    
    Then we don't need that other function 

ds[k].attrs['processing'] += (
' Using geometric mean implementation ' 'scipy.stats.gmean'
)
elif 'QC_protocol' in ds[k].attrs:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so does this work? And how do you specify nanmax in the ds['k'].attrs['average_method']?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked. I added it to the attributes when I defined _QC:

ts['conductivity_QC'] = xr.where(np.isfinite(ts.conductivityClean), 1, 4)
ts['conductivity_QC'].attrs['method'] = 'QC_protocol'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be

ts['conductivity_QC'].attrs['average_method'] = 'QC_protocol'

But I think this test tests if 'QC_protocol' is a key to attrs (eg if attrs['QC_protocol']) not if QC_protocol is the value of one of the elements.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I am changing this to: elif 'QC_protocol' in ds[k].attrs.values():

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why not check the proper key?

dz=1,
starttime='1970-01-01',
):
inname, outdir, deploymentyaml, *, fnamesuffix='', dz=1, starttime='1970-01-01', maskfunction=CPROOF_mask):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is CPROOF_mask?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot. I forgot to add that. It is masking over QC4 (the bad data)

def CPROOF_mask(ds):
"""Mask QC4 samples in data variables (set to NaN) so gridding ignores them.
Does NOT shorten arrays or overwrite QC variables.
"""
_log = logging.getLogger(name)

ds = ds.copy() 

for k in list(ds.data_vars):
    # skip QC variables themselves
    if k.endswith("_QC"):
        continue

    qc_name = f"{k}_QC"
    if qc_name not in ds:
        continue

    # mask data where QC == 4, preserving dims/coords
    ds[k] = ds[k].where(ds[qc_name] != 4)
    ds[qc_name] = ds[qc_name].where(ds[qc_name] != 4)
    
return ds

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd make the default be None


ds = xr.open_dataset(inname, decode_times=True)
ds = ds.where(ds.time > np.datetime64(starttime), drop=True)
ds0 = xr.open_dataset(inname, decode_times=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't use ds0 after you make ds so is there a reason to not just name this ds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, I could just mask at the very top then do everything with ds

'depth' and 'profile', so each variable is gridded in depth bins and by
profile number. Each profile has a time, latitude, and longitude.
The depth values are the bin centers
profile number. Each profile has a time, latitude, and longitude.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe maskfunction and also describe the behaviour of QC_protocol fallback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also let's add a description of "average_method" in here in general. It should be part of this documentation.

Add CPROOF_mask function to mask QC4 samples in data variables, and update make_gridfiles to apply this mask function before gridding.
Added original function back

# Fill only continuous variables, not QC flags
if 'QC_protocol' not in ds[k].attrs :
dsout[k] = dsout[k].interpolate_na(dim="depth", method="linear", max_gap=50)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work OK?

Also not sure if we should hard code this versus making it a parameter. Thus ends up being 50 m, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made into parameter - Yes, I chose 50m since it is relatively small. The spikes the original code was making was interpolating over >50m





Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this extra white space

data[:, j][ind[0] : ind[-1]] = np.interp(int, ind, data[ind, j])
return data


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this

continue
if 'average_method' in ds[k].attrs:
average_method = ds[k].attrs['average_method']
# variables are treated as d continuous data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# variables are treated as d continuous data.
# variables are treated as continuous data.

Vertical grid spacing in meters.

maskfunction : callable or None, optional
Function applied to the dataset before gridding.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Function applied to the dataset before gridding.
Function applied to the dataset before gridding, usually to choose what data will be set to NaN based on quality flags.

Added vertical interpolation for QC and continuous variables in ncprocess.py.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants