Skip to content

Why use HDF5 at all? Just define a Zarr convention. #94

@TomNicholas

Description

@TomNicholas

The zarrmd trajectory file format uses the H5MD file format as a directory structure and metadata specification for storing molecular dynamics simulation data in the Zarr file format.

AFAICT H5MD is tied to HDF5. But why use HDF5 here in ZarrMD at all? Zarr already specifies a group (i.e. directory) structure, so what is HDF5 giving you?

zarrmd files are exactly the same as H5MD files, but with the .zarrmd extension, except for one broken H5MD layout requirement: ...

This is not possible in Zarr, so this requirement is relaxed in zarrmd such that the ‘step’ and ‘time’ datasets of the ‘particles’ trajectory group in the simulation box and positions datasets are required to contain the exact same shape and data, but are not required to be hard linked.

This is really a (structural and metadata) convention for how to lay out data, and what metadata should be contained. It's not a file-format-level concern.

Why not just forget about HDF5 entirely, and define a Zarr convention for MD data in Zarr?

The same thing could even be retroactively applied to HDF5 data, as it seems like H5MD data could also conform to the same convention.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions