Skip to content

Chunk index logged as rebuilt on every var access for the same variable #206

@bnlawrence

Description

@bnlawrence

When the same variable is accessed more than once via f[var] on an already-open f, the log reports "Building chunk index" / "Chunk index built" on every access.

Environment: pyfive 1.1.1, Python 3.12

Minimal reproducer (local file):

import logging, pyfive
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")

with pyfive.File("da193a_25_3hr__198807-198807.nc", "r") as handle:
    for i in range(4):
        ds = handle["/m01s03i245"]
        print(f"access {i+1}: shape={ds.shape}")

Output:

access 1: shape=(240, 324, 432)
...
access 4: shape=(240, 324, 432)
INFO pyfive.high_level: [pyfive] Accessing object '/m01s03i245' with link target 38262 (lazy access: False)
INFO pyfive.h5d: [pyfive] Building chunk index (pyfive version=1.1.1)
INFO pyfive.h5d: [pyfive] Chunk index built: btree range=(64458, 505785541); elapsed=1ms
INFO pyfive.high_level: [pyfive] Accessing object '/m01s03i245' with link target 38262 (lazy access: False)
INFO pyfive.h5d: [pyfive] Building chunk index (pyfive version=1.1.1)
INFO pyfive.h5d: [pyfive] Chunk index built: btree range=(64458, 505785541); elapsed=0ms
... [×2 more]

Question: Is the chunk index actually being rebuilt each time, or is the log line unconditional and the result is served from a cache? The 0ms elapsed on accesses 2–4 suggests the latter, but if the B-tree is being re-parsed from the file on every subscript access that would be expensive over a remote (S3/HTTPS) fsspec filesystem where each seek/read has latency.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions