Redesigned dataset_compliance w/ standard names validation#373
Redesigned dataset_compliance w/ standard names validation#373sadielbartholomew wants to merge 155 commits intoNCAS-CMS:mainfrom
dataset_compliance w/ standard names validation#373Conversation
f57edc9 to
eacd492
Compare
…advised (for now)
|
Linting CI job is failing due to issues with the service ("Our services aren't available right now") so please ignore that - I have run pre-commit on the final PR pre-review state as above and (excluding doc-string formatting which I've chosen to ignore in the interest of time) everything passes. |
davidhassell
left a comment
There was a problem hiding this comment.
Hi Sadie - a first pass.
Logically looks good (I think - netcdfread.py changes are pretty hard to follow :).
Structurally, I think checker.py should be in read_write/netcdf/, as commented.
I'm going to submit this now, but would still like to look some more at netcdfread.py for my own education.
|
|
||
| return fragment | ||
|
|
||
| def __repr__(self): |
There was a problem hiding this comment.
| def __repr__(self): | |
| def __repr__(self): |
| import logging | ||
|
|
||
| # To parse the XML - better than using manual regex parsing! | ||
| import xml.etree.ElementTree as ET |
There was a problem hiding this comment.
Can we move this into function/method, since we don't load non-standard external libraries at import cfdm time.
|
|
||
| **2026-??-??** | ||
|
|
||
| * Improved output of `Field.dataset_compliance` for (at present, |
There was a problem hiding this comment.
| * Improved output of `Field.dataset_compliance` for (at present, | |
| * Improved output of `cfdm.Field.dataset_compliance` for (at present, |
|
|
||
| def _make_ugrid_1(filename): | ||
| """Create a UGRID file with a 2-d mesh topology.""" | ||
| def _make_ugrid_1(filename, standard_names): |
There was a problem hiding this comment.
Can we make it so that the correct standard names are in this function? Then do stuff like
if standard_names:
Mesh2_node_x.standard_name = standard_names[0]
else:
Mesh2_node_x.standard_name = "longitude"| @@ -0,0 +1,2577 @@ | |||
| import logging | |||
There was a problem hiding this comment.
It's good that these are in a separate file, but I think that it should be in cfdm/read_write/netcdf. The contents here can not be used with cfdm.Fields in memory, only on datasets being read.
| compliance.""" | ||
| self.dimensions.append(dim) | ||
|
|
||
| def as_report_fragment(self): |
| self.variables.append(var) | ||
| return var | ||
|
|
||
| def as_report_fragment(self): |
| compliance.""" | ||
| self.dimensions.append(dim) | ||
|
|
||
| def as_report_fragment(self): |
| logger.warning( | ||
| f"Detected invalid standard name: '{sn_attr}' of " | ||
| f"'{sn_value}' for {ncvar}" | ||
| ) |
There was a problem hiding this comment.
| logger.warning( | |
| f"Detected invalid standard name: '{sn_attr}' of " | |
| f"'{sn_value}' for {ncvar}" | |
| ) | |
| if self.read_vars["_noncompliance_report"]: | |
| logger.warning( | |
| f"Detected invalid standard name: '{sn_attr}' of " | |
| f"'{sn_value}' for {ncvar}" | |
| ) |
and need to add _nononcompliance_report to read_vars
| def _check_cell_methods(self, field_ncvar, cell_method): | ||
| """Check the cell methods. | ||
|
|
||
| .. versionadded:: (cfdm) NEXTVERSION | ||
|
|
||
| """ | ||
| # TODO SLB unclear how to check on cell methods, will leave | ||
| # for now. | ||
| # self._check_standard_names( | ||
| # field_ncvar, | ||
| # field_ncvar, | ||
| # # self.read_vars["variable_attributes"][field_ncvar]["cell_methods"], | ||
| # ) |
There was a problem hiding this comment.
This is currently sort embroiled within _parse_cell_methods
Close #366 by setting up discussed data structure to close #365, reporting invalid standard names in the new output structure, as indicated in #365 (comment).
Is quite a hefty PR with a tragic amount of commits, so happy to squash down the first ~50-100 of these, which were mostly development (and/or investigative behaviour) commits which were incrementally updated as we revised our idea for the Conformance Data Model (see UML diagram in #365 (comment)).
Some minor follow on work when we have time to restart conformance work is to:
Outstanding questions
Aspects I am unsure about / questions:
1..*NonConformanceCaseforAttributeNonConformanceas per our UML - I think in practice the non-conformance could be further down the chain, not a direct association - so I think this should be0..*and that is what this PR code assumes (does that make sense?).Review guidance
Structure of new
conformancemoduleUML diagrams generated with
pyreverse, though note they only include theconformancemodule separate to the wholecfdmmodule, so don't pick up on external connections notably to all dataset reading logic especiallyNetCDFRead. But could be a useful overview:Packages
Classes
Notes on PR and approach
conformancewhich is based on a Conformance Data Model._check_*and_ugrid_check_*method fromnetcdfreadto the new dedicated submoduleconformance.checker.conformance.reporting.as_report_fragmentis the ultimate main method from thedatamodelmodule to note fordataset_compliance- it generates adictby recursively operating on all relevant*NonConformanceobjects with the same method defined, to generate a structure from all of the dict fragments resulting in the possibly (heavily-)nested output.Advice on how to review
conformance.checker._check_*and_ugrid_check_*methods, which is just to add_check_standard_nameand_include_component_reportcalls in the right places. To make reviewing easier I copied themainpost-merge state of those methods innetcdfreadand then made any changes to those once moved in 33786f5, with some further additions necessary for tweaks and fixes, so please rungit diff 0a3be736b85cebea58587844cc887beff9cfc497 checker.pyto see and review all updates to the checking methods previously living innetcdfread.Representative outputs
As per the new test module
test_compliance_checking.py, we test on a non-UGRID 'kitchen sink' and a UGRID field with the expected outputs as follows, abiding by the Conformance Data Model:Kitchen sink non-UGRID field
{'CF version': '1.13', 'ta': {'attributes': {'ancillary_variables': {'value': 'air_temperature_standard_error', 'variables': {'air_temperature_standard_error': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_air_temperature_standard_error'}}}}}, 'cell_measures': {'value': 'cell_measure', 'variables': {'cell_measure': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_cell_measure'}}}}}, 'coordinates': {'value': 'time', 'variables': {'auxiliary': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_auxiliary'}}}, 'latitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_latitude_1'}}}, 'longitude_1': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_longitude_1'}}}, 'time': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_time'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has a ' 'value ' 'that ' 'is ' 'not a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_ta'}}}}UGRID field
{'CF version': '1.13', 'pa': {'attributes': {'mesh': {'value': 'Mesh2', 'variables': {'Mesh2': {'attributes': {'edge_node_connectivity': {'value': 'Mesh2_edge_nodes', 'variables': {'Mesh2_edge_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_edge_nodes'}}}}}, 'face_face_connectivity': {'value': 'Mesh2_face_links', 'variables': {'Mesh2_face_links': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_face_links'}}}}}, 'face_node_connectivity': {'value': 'Mesh2_face_nodes', 'variables': {'Mesh2_face_nodes': {'attributes': {'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2_face_nodes'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has ' 'a ' 'value ' 'that ' 'is ' 'not ' 'a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_Mesh2'}}}}}, 'standard_name': {'non-conformance': [{'code': 400022, 'reason': 'standard_name ' 'attribute ' 'has a ' 'value ' 'that ' 'is ' 'not a ' 'valid ' 'name ' 'contained ' 'in ' 'the ' 'current ' 'standard ' 'name ' 'table'}], 'value': 'badname_air_pressure'}}}}