Improve memory usage in uvfits reader by bhazelton · Pull Request #1651 · RadioAstronomySoftwareGroup/pyuvdata

bhazelton · 2026-02-24T02:27:07Z

Description

This makes a few memory handling improvements to the uvfits reader (and maybe the MWA correlator FITS reader).

It turns out that when using astropy.io.fits.open with memmap=True, the python garbage collector does not release the memory promptly upon exiting the context manager. We were always using memmap=True, but it's really only needed when doing partial reads. This PR changes the calls to the astropy.io.fits.open to only use memmap=True when doing partial reads. This leads to significant memory improvements for the uvfits reader when reading the whole file. It's not clear if there are any improvements to the MWA correlator FITS reader in terms of memory usage, but it seemed better to be consistent.

I also refactored the uvfits reader to remove a temporary array that was being used to store and manipulate the data stored in the primary uvfits HDU before assigning parts of it to the data_array, nsample_array and flag_array. I now have it just assign the full data from the HDU to the data_array and do all the manipulations in place in that array. It feels a bit yucky but leads to much better memory usage.

Below are plots of the memory usage as measured by memray for a small script that just reads in an MWA uvfits file (the blue resident size is most important). There is some run-to-run variability to these plots that I don't fully understand, so I'm including 2 plots for each codebase, from different days but all three codebase runs were run back to back on the 2 different days (all on my laptop):

main branch:

After using memmap=False:

After refactoring out the temporary array:

Note that the peak memory use doesn't change much (and is seems quite variable in the last set of plots), but there's a consistent big change in the final memory usage at the end of the script. If you are doing anything with the object after reading it, that becomes your baseline memory usage for the next step. I actually stumbled on this when trying to profile something else (frequency averaging) and discovered that the memory usage was much higher when I used a script that started by reading in a uvfits file vs when I started by reading in the same data saved as a uvh5 file.

Similar plots for the MWA Correlator FITS reader seem to be dominated by run-to-run variability -- I don't see a consistent improvement with these changes, but I don't see it getting materially worse either.

We could certainly make similar changes in the calfits and beamfits readers, but I wasn't sure I should cram them into this PR.

Motivation and Context

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation change (documentation changes only)
Version change
Build or continuous integration change
Other

Checklist:

I have read the contribution guide.
My code follows the code style of this project.

Other:

I have updated any docstrings associated with my change using the numpy docstring format.
I have updated the readme and/or tutorial to reflect my changes (if appropriate).
I have added/updated tests to cover my changes (if appropriate).
I have updated the CHANGELOG.

codecov · 2026-02-24T03:16:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.93%. Comparing base (ad3e728) to head (3de7ed6).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #1651   +/-   ##
=======================================
  Coverage   99.93%   99.93%           
=======================================
  Files          67       67           
  Lines       22688    22705   +17     
=======================================
+ Hits        22674    22691   +17     
  Misses         14       14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

bhazelton added 4 commits February 20, 2026 10:14

Fix location parameter handling for initialization

ba4704e

Only use memmap when required

747065f

Don't make temporary arrays

dc10411

only use memmap=True on FITS reads when needed

3de7ed6

bhazelton requested a review from steven-murray February 24, 2026 02:27

bhazelton added enhancement UVData labels Feb 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve memory usage in uvfits reader#1651

Improve memory usage in uvfits reader#1651
bhazelton wants to merge 4 commits intomainfrom
mem_fixes

bhazelton commented Feb 24, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bhazelton commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Types of changes

Checklist:

Uh oh!

codecov bot commented Feb 24, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bhazelton commented Feb 24, 2026 •

edited

Loading