Skip to content

Redesigned dataset_compliance w/ standard names validation#373

Open
sadielbartholomew wants to merge 155 commits intoNCAS-CMS:mainfrom
sadielbartholomew:validate-standard-names
Open

Redesigned dataset_compliance w/ standard names validation#373
sadielbartholomew wants to merge 155 commits intoNCAS-CMS:mainfrom
sadielbartholomew:validate-standard-names

Conversation

@sadielbartholomew
Copy link
Copy Markdown
Member

@sadielbartholomew sadielbartholomew commented Dec 19, 2025

Close #366 by setting up discussed data structure to close #365, reporting invalid standard names in the new output structure, as indicated in #365 (comment).

Is quite a hefty PR with a tragic amount of commits, so happy to squash down the first ~50-100 of these, which were mostly development (and/or investigative behaviour) commits which were incrementally updated as we revised our idea for the Conformance Data Model (see UML diagram in #365 (comment)).

Some minor follow on work when we have time to restart conformance work is to:

Outstanding questions

Aspects I am unsure about / questions:

  • cell methods and how to report about issues on those;
  • whether the Data Model should have 1..* NonConformanceCase for AttributeNonConformance as per our UML - I think in practice the non-conformance could be further down the chain, not a direct association - so I think this should be 0..* and that is what this PR code assumes (does that make sense?).

Review guidance

Structure of new conformance module

UML diagrams generated with pyreverse, though note they only include the conformance module separate to the whole cfdm module, so don't pick up on external connections notably to all dataset reading logic especially NetCDFRead. But could be a useful overview:

Packages

packages_conf-final-conformancedir

Classes

classes_conf-final-conformancedir

Notes on PR and approach

  1. As discussed in person during development, the new conformance checking logic is implemented using a new submodule conformance which is based on a Conformance Data Model.
  2. Towards separation of concerns, I have moved all _check_* and _ugrid_check_* method from netcdfread to the new dedicated submodule conformance.checker.
  3. And any reporting-related functionality is in conformance.reporting. as_report_fragment is the ultimate main method from the datamodel module to note for dataset_compliance - it generates a dict by recursively operating on all relevant *NonConformance objects with the same method defined, to generate a structure from all of the dict fragments resulting in the possibly (heavily-)nested output.

Advice on how to review

  • Best review the code changes as a whole (not on an individual commit basis - there are too many and a bit of a mess due to the moving nature of development goals, sorry!), though note the below regarding reviewing conformance.checker.
  • Given (2) above, I realised on later merge conflict resolution that it would be difficult to see what changes I made to the _check_* and _ugrid_check_* methods, which is just to add _check_standard_name and _include_component_report calls in the right places. To make reviewing easier I copied the main post-merge state of those methods in netcdfread and then made any changes to those once moved in 33786f5, with some further additions necessary for tweaks and fixes, so please run git diff 0a3be736b85cebea58587844cc887beff9cfc497 checker.py to see and review all updates to the checking methods previously living in netcdfread.

Representative outputs

As per the new test module test_compliance_checking.py, we test on a non-UGRID 'kitchen sink' and a UGRID field with the expected outputs as follows, abiding by the Conformance Data Model:

Kitchen sink non-UGRID field

{'CF version': '1.13',
 'ta': {'attributes': {'ancillary_variables': {'value': 'air_temperature_standard_error',
                                               'variables': {'air_temperature_standard_error': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                      'reason': 'standard_name '
                                                                                                                                                                'attribute '
                                                                                                                                                                'has '
                                                                                                                                                                'a '
                                                                                                                                                                'value '
                                                                                                                                                                'that '
                                                                                                                                                                'is '
                                                                                                                                                                'not '
                                                                                                                                                                'a '
                                                                                                                                                                'valid '
                                                                                                                                                                'name '
                                                                                                                                                                'contained '
                                                                                                                                                                'in '
                                                                                                                                                                'the '
                                                                                                                                                                'current '
                                                                                                                                                                'standard '
                                                                                                                                                                'name '
                                                                                                                                                                'table'}],
                                                                                                                                 'value': 'badname_air_temperature_standard_error'}}}}},
                       'cell_measures': {'value': 'cell_measure',
                                         'variables': {'cell_measure': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                              'reason': 'standard_name '
                                                                                                                                        'attribute '
                                                                                                                                        'has '
                                                                                                                                        'a '
                                                                                                                                        'value '
                                                                                                                                        'that '
                                                                                                                                        'is '
                                                                                                                                        'not '
                                                                                                                                        'a '
                                                                                                                                        'valid '
                                                                                                                                        'name '
                                                                                                                                        'contained '
                                                                                                                                        'in '
                                                                                                                                        'the '
                                                                                                                                        'current '
                                                                                                                                        'standard '
                                                                                                                                        'name '
                                                                                                                                        'table'}],
                                                                                                         'value': 'badname_cell_measure'}}}}},
                       'coordinates': {'value': 'time',
                                       'variables': {'auxiliary': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                         'reason': 'standard_name '
                                                                                                                                   'attribute '
                                                                                                                                   'has '
                                                                                                                                   'a '
                                                                                                                                   'value '
                                                                                                                                   'that '
                                                                                                                                   'is '
                                                                                                                                   'not '
                                                                                                                                   'a '
                                                                                                                                   'valid '
                                                                                                                                   'name '
                                                                                                                                   'contained '
                                                                                                                                   'in '
                                                                                                                                   'the '
                                                                                                                                   'current '
                                                                                                                                   'standard '
                                                                                                                                   'name '
                                                                                                                                   'table'}],
                                                                                                    'value': 'badname_auxiliary'}}},
                                                     'latitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                          'reason': 'standard_name '
                                                                                                                                    'attribute '
                                                                                                                                    'has '
                                                                                                                                    'a '
                                                                                                                                    'value '
                                                                                                                                    'that '
                                                                                                                                    'is '
                                                                                                                                    'not '
                                                                                                                                    'a '
                                                                                                                                    'valid '
                                                                                                                                    'name '
                                                                                                                                    'contained '
                                                                                                                                    'in '
                                                                                                                                    'the '
                                                                                                                                    'current '
                                                                                                                                    'standard '
                                                                                                                                    'name '
                                                                                                                                    'table'}],
                                                                                                     'value': 'badname_latitude_1'}}},
                                                     'longitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                           'reason': 'standard_name '
                                                                                                                                     'attribute '
                                                                                                                                     'has '
                                                                                                                                     'a '
                                                                                                                                     'value '
                                                                                                                                     'that '
                                                                                                                                     'is '
                                                                                                                                     'not '
                                                                                                                                     'a '
                                                                                                                                     'valid '
                                                                                                                                     'name '
                                                                                                                                     'contained '
                                                                                                                                     'in '
                                                                                                                                     'the '
                                                                                                                                     'current '
                                                                                                                                     'standard '
                                                                                                                                     'name '
                                                                                                                                     'table'}],
                                                                                                      'value': 'badname_longitude_1'}}},
                                                     'time': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                    'reason': 'standard_name '
                                                                                                                              'attribute '
                                                                                                                              'has '
                                                                                                                              'a '
                                                                                                                              'value '
                                                                                                                              'that '
                                                                                                                              'is '
                                                                                                                              'not '
                                                                                                                              'a '
                                                                                                                              'valid '
                                                                                                                              'name '
                                                                                                                              'contained '
                                                                                                                              'in '
                                                                                                                              'the '
                                                                                                                              'current '
                                                                                                                              'standard '
                                                                                                                              'name '
                                                                                                                              'table'}],
                                                                                               'value': 'badname_time'}}}}},
                       'standard_name': {'non-conformance': [{'code': 400022,
                                                              'reason': 'standard_name '
                                                                        'attribute '
                                                                        'has a '
                                                                        'value '
                                                                        'that '
                                                                        'is '
                                                                        'not a '
                                                                        'valid '
                                                                        'name '
                                                                        'contained '
                                                                        'in '
                                                                        'the '
                                                                        'current '
                                                                        'standard '
                                                                        'name '
                                                                        'table'}],
                                         'value': 'badname_ta'}}}}

UGRID field

{'CF version': '1.13',
 'pa': {'attributes': {'mesh': {'value': 'Mesh2',
                                'variables': {'Mesh2': {'attributes': {'edge_node_connectivity': {'value': 'Mesh2_edge_nodes',
                                                                                                  'variables': {'Mesh2_edge_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_edge_nodes'}}}}},
                                                                       'face_face_connectivity': {'value': 'Mesh2_face_links',
                                                                                                  'variables': {'Mesh2_face_links': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_face_links'}}}}},
                                                                       'face_node_connectivity': {'value': 'Mesh2_face_nodes',
                                                                                                  'variables': {'Mesh2_face_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                                                                                                           'reason': 'standard_name '
                                                                                                                                                                                                     'attribute '
                                                                                                                                                                                                     'has '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'value '
                                                                                                                                                                                                     'that '
                                                                                                                                                                                                     'is '
                                                                                                                                                                                                     'not '
                                                                                                                                                                                                     'a '
                                                                                                                                                                                                     'valid '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'contained '
                                                                                                                                                                                                     'in '
                                                                                                                                                                                                     'the '
                                                                                                                                                                                                     'current '
                                                                                                                                                                                                     'standard '
                                                                                                                                                                                                     'name '
                                                                                                                                                                                                     'table'}],
                                                                                                                                                                      'value': 'badname_Mesh2_face_nodes'}}}}},
                                                                       'standard_name': {'non-conformance': [{'code': 400022,
                                                                                                              'reason': 'standard_name '
                                                                                                                        'attribute '
                                                                                                                        'has '
                                                                                                                        'a '
                                                                                                                        'value '
                                                                                                                        'that '
                                                                                                                        'is '
                                                                                                                        'not '
                                                                                                                        'a '
                                                                                                                        'valid '
                                                                                                                        'name '
                                                                                                                        'contained '
                                                                                                                        'in '
                                                                                                                        'the '
                                                                                                                        'current '
                                                                                                                        'standard '
                                                                                                                        'name '
                                                                                                                        'table'}],
                                                                                         'value': 'badname_Mesh2'}}}}},
                       'standard_name': {'non-conformance': [{'code': 400022,
                                                              'reason': 'standard_name '
                                                                        'attribute '
                                                                        'has a '
                                                                        'value '
                                                                        'that '
                                                                        'is '
                                                                        'not a '
                                                                        'valid '
                                                                        'name '
                                                                        'contained '
                                                                        'in '
                                                                        'the '
                                                                        'current '
                                                                        'standard '
                                                                        'name '
                                                                        'table'}],
                                         'value': 'badname_air_pressure'}}}}

@sadielbartholomew sadielbartholomew marked this pull request as ready for review January 28, 2026 21:35
@sadielbartholomew
Copy link
Copy Markdown
Member Author

Linting CI job is failing due to issues with the service ("Our services aren't available right now") so please ignore that - I have run pre-commit on the final PR pre-review state as above and (excluding doc-string formatting which I've chosen to ignore in the interest of time) everything passes.

@davidhassell davidhassell added this to the NEXTVERSION milestone Feb 18, 2026
Copy link
Copy Markdown
Contributor

@davidhassell davidhassell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sadie - a first pass.

Logically looks good (I think - netcdfread.py changes are pretty hard to follow :).

Structurally, I think checker.py should be in read_write/netcdf/, as commented.

I'm going to submit this now, but would still like to look some more at netcdfread.py for my own education.


return fragment

def __repr__(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def __repr__(self):
def __repr__(self):

import logging

# To parse the XML - better than using manual regex parsing!
import xml.etree.ElementTree as ET
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into function/method, since we don't load non-standard external libraries at import cfdm time.


**2026-??-??**

* Improved output of `Field.dataset_compliance` for (at present,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Improved output of `Field.dataset_compliance` for (at present,
* Improved output of `cfdm.Field.dataset_compliance` for (at present,


def _make_ugrid_1(filename):
"""Create a UGRID file with a 2-d mesh topology."""
def _make_ugrid_1(filename, standard_names):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it so that the correct standard names are in this function? Then do stuff like

            if standard_names:
                Mesh2_node_x.standard_name = standard_names[0]
            else:
                Mesh2_node_x.standard_name = "longitude"

@@ -0,0 +1,2577 @@
import logging
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that these are in a separate file, but I think that it should be in cfdm/read_write/netcdf. The contents here can not be used with cfdm.Fields in memory, only on datasets being read.

compliance."""
self.dimensions.append(dim)

def as_report_fragment(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todict again?

self.variables.append(var)
return var

def as_report_fragment(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todict again?

compliance."""
self.dimensions.append(dim)

def as_report_fragment(self):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todict again?

Comment on lines +299 to +302
logger.warning(
f"Detected invalid standard name: '{sn_attr}' of "
f"'{sn_value}' for {ncvar}"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.warning(
f"Detected invalid standard name: '{sn_attr}' of "
f"'{sn_value}' for {ncvar}"
)
if self.read_vars["_noncompliance_report"]:
logger.warning(
f"Detected invalid standard name: '{sn_attr}' of "
f"'{sn_value}' for {ncvar}"
)

and need to add _nononcompliance_report to read_vars

Comment on lines +336 to +348
def _check_cell_methods(self, field_ncvar, cell_method):
"""Check the cell methods.

.. versionadded:: (cfdm) NEXTVERSION

"""
# TODO SLB unclear how to check on cell methods, will leave
# for now.
# self._check_standard_names(
# field_ncvar,
# field_ncvar,
# # self.read_vars["variable_attributes"][field_ncvar]["cell_methods"],
# )
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently sort embroiled within _parse_cell_methods

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conformance enhancement New feature or request UGRID Relating to UGRID mesh topologies

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compliance reporting: flag any invalid standard names Output for Field.dataset_compliance towards a CF Checker

2 participants