Skip to content

add handling for multiple read/write permissions issues#130

Open
mdeshotel wants to merge 10 commits intodevelopmentfrom
mdeshotel-NGWPC-9788_cache_permission
Open

add handling for multiple read/write permissions issues#130
mdeshotel wants to merge 10 commits intodevelopmentfrom
mdeshotel-NGWPC-9788_cache_permission

Conversation

@mdeshotel
Copy link
Copy Markdown

This PR attempts to resolve a permissions issue with the cache files for historical simulations. For writing cache files the following is implemented:

  1. Read data from s3.
  2. Write to a local netcdf file with unique hash name.
  3. Once the data is fully written to the hash name, copy it to its cache name so that future iterations can access it.

For read conflicts (if they exists), the following logic is applied:

  1. Try to read the local netcdf file if it exists.
  2. If an error is received, produce warning, wait 1 second, and try again.
  3. Do step above for 10 seconds
  4. If 10 seconds/attempts are tried and the file is still locked, raise an error.

Additions

Removals

Changes

Testing

  1. Run suite in RTE was ran.

Screenshots

Notes

Todos

Checklist

  • PR has an informative and human-readable title
  • Changes are limited to a single goal (no scope creep)
  • Code can be automatically merged (no conflicts)
  • Code follows project standards (link if applicable)
  • Passes all existing automated tests
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future todos are captured in comments
  • Visually tested in supported browsers and devices (see checklist below 👇)
  • Project documentation has been updated (including the "Unreleased" section of the CHANGELOG)
  • Reviewers requested with the Reviewers tool ➡️

Testing checklist

Target Environment support

  • Linux

@mdeshotel mdeshotel requested review from kyle-larkin and mxkpp March 30, 2026 16:31
Copy link
Copy Markdown

@mxkpp mxkpp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most comments apply throughout.

Generally also, please write a shared function that can be used multiple times.

)
sleep(1)
c += 1
if c == 10:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the goal is to raise an error if the break is not reached, you can use else: instead of if c == 10:.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented.

ds.to_netcdf(self.nc_path, "w")
break
except Exception as e:
warnings.warn(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why warnings package instead of LOG?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched to LOG.

c += 1
if c == 10:
raise PermissionError(
f"Could write the netcdf cache file within the specified number of retries(10): {self.nc_path}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could not

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this typo.

try:
ds.to_netcdf(self.nc_path, "w")
break
except Exception as e:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a more specific exception type to catch?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have experienced a number of errors in the bug tickets. I'm catching any error and logging it so that if/when an error is reached we have a better starting point for figuring out what went wrong.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, fine if all exception types should be caught but the message should be less precise then.

break
except Exception as e:
warnings.warn(
f"There appears to be a lock on the netcdf cache file while writing. Sleeping 1 second and trying again ({c}). | Error: {e}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the last time that it reaches this, it does not actually try again

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is fine.

Comment on lines +407 to +409
with xr.open_dataset(self.nc_path, engine="netcdf4") as ds:
dataset = ds.load()
return dataset
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this returning ds.load() result now? Existing behavior was to return xr.open_dataset() result.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to a context manager for opening/closing the file. the ds.load() loads the data all at once into memory prior to exiting the context manager and closing the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants