-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
When the same variable is accessed more than once via f[var] on an already-open f, the log reports "Building chunk index" / "Chunk index built" on every access.
Environment: pyfive 1.1.1, Python 3.12
Minimal reproducer (local file):
import logging, pyfive
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")
with pyfive.File("da193a_25_3hr__198807-198807.nc", "r") as handle:
for i in range(4):
ds = handle["/m01s03i245"]
print(f"access {i+1}: shape={ds.shape}")Output:
access 1: shape=(240, 324, 432)
...
access 4: shape=(240, 324, 432)
INFO pyfive.high_level: [pyfive] Accessing object '/m01s03i245' with link target 38262 (lazy access: False)
INFO pyfive.h5d: [pyfive] Building chunk index (pyfive version=1.1.1)
INFO pyfive.h5d: [pyfive] Chunk index built: btree range=(64458, 505785541); elapsed=1ms
INFO pyfive.high_level: [pyfive] Accessing object '/m01s03i245' with link target 38262 (lazy access: False)
INFO pyfive.h5d: [pyfive] Building chunk index (pyfive version=1.1.1)
INFO pyfive.h5d: [pyfive] Chunk index built: btree range=(64458, 505785541); elapsed=0ms
... [×2 more]
Question: Is the chunk index actually being rebuilt each time, or is the log line unconditional and the result is served from a cache? The 0ms elapsed on accesses 2–4 suggests the latter, but if the B-tree is being re-parsed from the file on every subscript access that would be expensive over a remote (S3/HTTPS) fsspec filesystem where each seek/read has latency.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels