Gridding by lauryntalbot · Pull Request #233 · c-proof/pyglider

lauryntalbot · 2026-03-05T22:49:17Z

make_gridfiles in ncprocess: QC grids using nanmax rather than mean
gappy_vertical_fill in utils: doesn't fill gaps >50m

Updated gappy_fill_vertical to fill small NaN runs based on max_gap parameter.

jklymak · 2026-03-05T22:53:08Z

pyglider/utils.py

-    Applied column-wise.
-
-    data = gappy_fill_vertical(data)
+    Fill ONLY small NaN runs (<= max_gap bins) inside each vertical column.


I am pretty sure xarray has a gappy interp. Should we wswitch to that rather than this bespoke version?

I'll try that :)

Yeah , interpolate_na has a max_gap or "limit" parameters. Or does this do something different?

replace: dsout[k].values = utils.gappy_fill_vertical(dsout[k].values) with : dsout[k] = dsout[k].interpolate_na( dim="depth",method="linear",max_gap=50) Then we don't need that other function

jklymak · 2026-03-05T22:56:21Z

pyglider/ncprocess.py

                ds[k].attrs['processing'] += (
                    ' Using geometric mean implementation ' 'scipy.stats.gmean'
                )
+        elif 'QC_protocol' in ds[k].attrs:


OK, so does this work? And how do you specify nanmax in the ds['k'].attrs['average_method']?

It worked. I added it to the attributes when I defined _QC:

ts['conductivity_QC'] = xr.where(np.isfinite(ts.conductivityClean), 1, 4)
ts['conductivity_QC'].attrs['method'] = 'QC_protocol'

I think it should be

ts['conductivity_QC'].attrs['average_method'] = 'QC_protocol'

But I think this test tests if 'QC_protocol' is a key to attrs (eg if attrs['QC_protocol']) not if QC_protocol is the value of one of the elements.

You are right. I am changing this to: elif 'QC_protocol' in ds[k].attrs.values():

But why not check the proper key?

jklymak · 2026-03-05T22:56:43Z

pyglider/ncprocess.py

-    dz=1, 
-    starttime='1970-01-01', 
-):
+    inname, outdir, deploymentyaml, *, fnamesuffix='', dz=1, starttime='1970-01-01', maskfunction=CPROOF_mask):


What is CPROOF_mask?

Shoot. I forgot to add that. It is masking over QC4 (the bad data)

def CPROOF_mask(ds):
"""Mask QC4 samples in data variables (set to NaN) so gridding ignores them.
Does NOT shorten arrays or overwrite QC variables.
"""
_log = logging.getLogger(name)

ds = ds.copy() for k in list(ds.data_vars): # skip QC variables themselves if k.endswith("_QC"): continue qc_name = f"{k}_QC" if qc_name not in ds: continue # mask data where QC == 4, preserving dims/coords ds[k] = ds[k].where(ds[qc_name] != 4) ds[qc_name] = ds[qc_name].where(ds[qc_name] != 4) return ds

I'd make the default be None

jklymak · 2026-03-05T22:57:30Z

pyglider/ncprocess.py


-    ds = xr.open_dataset(inname, decode_times=True)
-    ds = ds.where(ds.time > np.datetime64(starttime), drop=True)
+    ds0 = xr.open_dataset(inname, decode_times=True)


You don't use ds0 after you make ds so is there a reason to not just name this ds?

Ya, I could just mask at the very top then do everything with ds

jklymak · 2026-03-05T22:58:42Z

pyglider/ncprocess.py

        'depth' and 'profile', so each variable is gridded in depth bins and by
-        profile number.  Each profile has a time, latitude, and longitude. 
-        The depth values are the bin centers
+        profile number.  Each profile has a time, latitude, and longitude.


Please describe maskfunction and also describe the behaviour of QC_protocol fallback

Also let's add a description of "average_method" in here in general. It should be part of this documentation.

Add CPROOF_mask function to mask QC4 samples in data variables, and update make_gridfiles to apply this mask function before gridding.

Added original function back

jklymak · 2026-03-06T15:32:01Z

pyglider/ncprocess.py

-
+        # Fill only continuous variables, not QC flags
+        if 'QC_protocol' not in ds[k].attrs :
+            dsout[k] = dsout[k].interpolate_na(dim="depth", method="linear", max_gap=50)


Does this work OK?

Also not sure if we should hard code this versus making it a parameter. Thus ends up being 50 m, right?

Made into parameter - Yes, I chose 50m since it is relatively small. The spikes the original code was making was interpolating over >50m

jklymak · 2026-03-06T15:32:16Z

pyglider/ncprocess.py

+
+
+



Suggested change

remove this extra white space

jklymak · 2026-03-06T15:32:39Z

pyglider/utils.py

            data[:, j][ind[0] : ind[-1]] = np.interp(int, ind, data[ind, j])
    return data

-


Revert this

jklymak · 2026-03-06T15:33:12Z

pyglider/ncprocess.py

            continue
        if 'average_method' in ds[k].attrs:
-            average_method = ds[k].attrs['average_method']
+            # variables are treated as d continuous data.


Suggested change

# variables are treated as d continuous data.

# variables are treated as continuous data.

jklymak · 2026-03-06T15:34:00Z

pyglider/ncprocess.py

+        Vertical grid spacing in meters.
+
+    maskfunction : callable or None, optional
+        Function applied to the dataset before gridding. 


Suggested change

Function applied to the dataset before gridding.

Function applied to the dataset before gridding, usually to choose what data will be set to NaN based on quality flags.

Added vertical interpolation for QC and continuous variables in ncprocess.py.

lauryntalbot added 2 commits March 5, 2026 14:33

Refactor make_gridfiles to include maskfunction

5be101a

Modify gappy_fill_vertical to include max_gap parameter

663222a

Updated gappy_fill_vertical to fill small NaN runs based on max_gap parameter.

jklymak reviewed Mar 5, 2026

View reviewed changes

lauryntalbot added 3 commits March 5, 2026 15:28

Implement CPROOF_mask function and update gridding

d9b6dc2

Add CPROOF_mask function to mask QC4 samples in data variables, and update make_gridfiles to apply this mask function before gridding.

Update utils.py

341dfac

Added original function back

Update ncprocess.py

d8c8a3c

jklymak reviewed Mar 6, 2026

View reviewed changes

Implement vertical interpolation for variables

4042466

Added vertical interpolation for QC and continuous variables in ncprocess.py.

		data[:, j][ind[0] : ind[-1]] = np.interp(int, ind, data[ind, j])
		return data

	# variables are treated as d continuous data.
	# variables are treated as continuous data.

	Function applied to the dataset before gridding.
	Function applied to the dataset before gridding, usually to choose what data will be set to NaN based on quality flags.

Conversation

lauryntalbot commented Mar 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants