-
Notifications
You must be signed in to change notification settings - Fork 2
Why use HDF5 at all? Just define a Zarr convention. #94
Description
The zarrmd trajectory file format uses the H5MD file format as a directory structure and metadata specification for storing molecular dynamics simulation data in the Zarr file format.
AFAICT H5MD is tied to HDF5. But why use HDF5 here in ZarrMD at all? Zarr already specifies a group (i.e. directory) structure, so what is HDF5 giving you?
zarrmd files are exactly the same as H5MD files, but with the .zarrmd extension, except for one broken H5MD layout requirement: ...
This is not possible in Zarr, so this requirement is relaxed in zarrmd such that the ‘step’ and ‘time’ datasets of the ‘particles’ trajectory group in the simulation box and positions datasets are required to contain the exact same shape and data, but are not required to be hard linked.
This is really a (structural and metadata) convention for how to lay out data, and what metadata should be contained. It's not a file-format-level concern.
Why not just forget about HDF5 entirely, and define a Zarr convention for MD data in Zarr?
The same thing could even be retroactively applied to HDF5 data, as it seems like H5MD data could also conform to the same convention.