RFC-8: Collections by normanrz · Pull Request #343 · ome/ngff

normanrz · 2025-09-29T12:06:49Z

This is the work-in-progress draft for RFC-8.

cc @jluethi @lorenzocerrone @tischi @perlman @matthewh-ebi

github-actions · 2025-09-29T12:06:59Z

Automated Review URLs

d-v-b · 2025-10-02T12:40:59Z

rfc/8/index.md

+#### `Collection` keys
+
+* `"type"` (required). Value must be `"collection"`.
+* `"nodes"` (required). Value must be an array of `CollectionNode` or `Collection` objects.


since every node has a unique name, why is this an array and not an object?

Yeah that could also work.

I wonder if representing an order may be desired, though. For example, https://ngff.openmicroscopy.org/latest/index.html#bf2raw states "Parsers like Bio-Formats define a strict, stable ordering of the images in a single container ...".
If it were an object the ordering would likely get lost in some JSON implementations. It could be represented through sortable node names, but that also seems less convenient.

order might also be useful for collections of layers in the context of an image visualization tool. Although you can always add an "order" field to the elements that's an integer (sort of the reverse of adding a "name" field that must be unique in the container).

d-v-b · 2025-10-02T12:41:59Z

rfc/8/index.md

+
+### Metadata
+
+This RFC defines two main objects for OME-Zarr: `Collection`, `CollectionNode`. 


A CollectionNode can be a Collection, so it's a bit confusing to say that these are two objects unless you explain that "object" here means something like "interface" or "protocol"

What would be the best term here? Is it a class?

As I understand it, there are currently 3 entities that need to be defined:

collection

multiscales

root

collection and multiscales can be discriminated based on their type field, and collection has attributes that multiscales does not, so regular inheritance from a base class doesn't express their relationship very well.

Maybe defining these as protocols would work? e.g., there's a core Node protocol, which the fields {type, name, attributes}, and objects that implement Node can also implement Collection OR Multiscales (but not both, because of the requirement on the type key). Finally, there's a Root protocol which can only be implemented by a Collection

Presumably bioformats2raw.layout and plate collections will still be around (not removed with this proposal). So a Node could be Collection or Multiscales or bioformats2raw or plate?

actually I was wrong, regular inheritance isn't problematic for Collection and Multiscales -- there's a base Node, Collection and Multiscales (and anything else) inherit from Node (totally fine for them to add new attributes as children).

As for the requirement is that there be only 1 root node, I don't think that can be expressed in a type system easily as long as the root is structurally compatible with a Collection, but that can be added as a regular requirement

If the requirement is just that the root node have version (weaker than requiring that only the root node have version), then this is a bit simpler.

Presumably bioformats2raw.layout and plate collections will still be around (not removed with this proposal). So a Node could be Collection or Multiscales or bioformats2raw or plate?

The idea is to remove bioformats2raw.layout and plate as separate entities with this proposal and express the functionality through attributes in the collection nodes. We need to work more on these.

Could this work similar to how I proposed it for the coordinate transforms? In essence, the paths specified in the plate metadata could be allowed to contain a Collection, which would contan the reference to the path.

rfc/8/index.md

d-v-b · 2025-10-02T13:21:58Z

this is looking really cool!

dstansby · 2025-10-15T17:31:05Z

Looks nice! As a quick initial comment, it would be super helpful to have a minmal example that demonstrates the new metadata structure being proposed - the webknossos examples are nice, but I'm struggling to distinguish what's required and optional in those files because there's lots of extra (I think?) attributes.

propose interface between rfc5 and rfc8

will-moore · 2025-10-30T16:25:20Z

rfc/8/index.md

+        }, {
+            "name": "..",
+            "type": "collection",
+            "path": "./nested_collection.json"


The collection should be a directory that contains a zarr.json, right?
e.g. "path": "./nested_collection.zarr"

Ah, now I see that this standalone json file is proposed as part of this RFC. But that isn't covered until much later below under Examples Where is this collection metadata stored?. Maybe that should be moved up above this point?

If an implementation is using e.g. zarr-python or another zarr library to retrieve zarr metadata, then it may be kinda painful to also support fetching of vanilla file.json files using a different mechanism? Don't know about other libs.

will-moore · 2025-10-31T10:48:35Z

I started a basic implementation of Collections spec for the validator at ome/ome-ngff-validator#62.
This should allow you to browse example Collections. Also there's a couple of linked test collections there to try out.

lorenzocerrone · 2025-11-06T16:40:51Z

rfc/8/index.md

+| - | - | - | - |
+| `"type"` | string | yes | Value must be `"multiscale"`. |
+| `"name"` | string | yes | Value must be a non-empty string. It should be a string that matches `[a-zA-Z0-9-_.]+`. Must be unique within one collections JSON file. |
+| `"path"` | string | yes | Value must be a string containing a path. [See paths section](#paths) |


Hi @normanrz,

Is the path for a multiscale node really required?

In the example custom-nodes.json:

{ "type": "multiscale", "name": "color", "attributes": { "webknossos:category": "color", "webknossos:bounding_box": { "topleft": { "x": 128, "y": 128, "z": 128 }, "size": { "x": 5445, "y": 8380, "z": 3285 } }, "webknossos:data_type": "uint8" }, "path": "/absolute/path/to/l4dense_motta_et_al_demo/color" }

Here, the attributes do not contain any OME-NGFF metadata, so (if I understand correctly) the axes, datasets, etc. are expected to be found in
/absolute/path/to/l4dense_motta_et_al_demo/color/zarr.json.

In contrast, in the other example inline-multiscale.json, the OME-NGFF metadata is provided at the top level:

{ "type": "multiscale", "name": "segmentation", "attributes": { "webknossos:category": "segmentation", "webknossos:bounding_box": { "topleft": { "x": 0, "y": 0, "z": 0 }, "size": { "x": 5632, "y": 8704, "z": 3584 } }, "webknossos:data_type": "uint32", "webknossos:values": { "max": 100000 } }, "multiscales": [ { "axes": [ { "name": "c", "type": "channel" }, { "name": "x", "type": "space", "unit": "nanometer" }, { "name": "y", "type": "space", "unit": "nanometer" }, { "name": "z", "type": "space", "unit": "nanometer" } ], ... } ] }

In this second case, there is no path for the node.

(besides the paths in the datasets)

There is still an open design question, whether we should allow inlining multiscales. If yes, the path will not be required anymore. My example was just an experiment, not normative.

imagesc-bot · 2026-01-21T12:43:41Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/should-zarr-stores-contain-a-top-level-zarr-json-file/118714/6

imagesc-bot · 2026-02-18T15:04:58Z

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/how-to-build-hcs-zarrs-with-multiple-image-types-per-fov/119329/2

tepals · 2026-03-13T10:29:02Z

As part of the CCP-volumeEM OME-NGFF Hackathon at EMBL-EBI Hinxton, we wrote down the following user story:

User Story: Large-Scale Multi-Beam vEM Tiling

Definitions

sub-image: 900x900 px image acquired by a single beam
tile: 8x8 sub-images stitched
slice: multiple tiles that make up a z-plane in the volume (each slice is made up of 100 - 1600 tiles) => 6400 - 100k sub-images in a slice
volume: series of slices (up to 100)

User Story
A microscopist using a Delmic FAST-EM system acquires up to a 100-slice volume where the imaged area varies per slice (e.g., following an irregular tissue boundary). Each slice consists of a variable number of tiles, each tile consists of 64 900 x 900 pixel sub-images with a 4 nm pixel size. While a nominal 100-pixel overlap is targeted, the actual spatial distribution is non-uniform, resulting in varying slice dimensions (& a need for individual transformations).

Currently this is solved by stitching the sub-images to a single tile, each tile is saved as an individual pyramidal tiff containing the first 3 zoom levels. For optimized viewing the fourth zoom level is created by stitching 16 (?) tiles and saving that as an individual tiff.

This raises 2 challenges:

How many OME-Zarrs with individual transformations can reasonably be stored in a single collection? What are the trade-offs between saving the data viewer-optimized in a single OME-Zarrs vs. having many OME-Zarrs with transformations that allow full flexibility in raw data access and can still display a fused image at some performance cost. Additional complexity arises for parallel writing of a collection: Avoiding race conditions when writing the collections files & overly large collection json metadata.
When handling very many OME-Zarrs & building pyramid layers (around 8) coming form many individual OME-Zarrs, lower resolution pyramid layers would need to be combined across multiple OME-Zarrs. Otherwise, the lowest res version of a 900px sub-images is a ~3x3px array (while at the resolution of the full field, it would still be a 400x400 image)

To handle problem 1, the OME-NGFF metadata must scale to coordinateTransformation for each of the N (100 - 1600 if tiles are OME-Zarr, 6400 - 100k if sub-images are OME-Zarrs) sub-images to position them within a shared 2D physical space. This would allow a viewer to render a "stitched" multiscale global view that only exists where data was actually acquired, while still providing direct access to the underlying raw overlapping FOVs regardless of their local grid density.
In order to handle viewing at lower resolutions, the collection spec would need to support combining many single scales at full resolution with fewer single-scale at lower resolution to build a pyramid.

Rough idea of the Example file structure, where each slice in a volume is saved as an OME-Zarr collection with shared low-resolution pyramid layers:

slice.zarr/
├── zarr.json                 <-- Collection metadata             
│
├── low_res_slice/            <-- Low-res, shared single scales (either grouped as a multiscales or individual single scales)
│   ├── zarr.json             <-- Multiscales (Zoom 4+)
│   ├── 4/                    <-- Resolution level 4 across the tiles
│   ├── 5/                    <-- Resolution level 5 across the tiles
│   └── 6/                    <-- Resolution level 6 across the tiles
│
├── tile_001/                 <-- One of the tiles in the large 2D slice
│   ├── zarr.json             <-- transformation: [z: 0, y: 0, x: 0]
│   ├── 0                     <-- full resolution data
│   ├── 1                     <-- down-sampled data
│   ├── 2                     <-- down-sampled data
│   └── 3/                    <-- down-sampled data
├── tile_002/                 <-- Second tile in the large 2D slice
│   ├── zarr.json             <-- transformation: [z: 0, y: 0, x: 812] => tile shifted in X
│   ├── 0                     <-- full resolution data
│   ├── 1                     <-- down-sampled data
│   ├── 2                     <-- down-sampled data
│   └── 3/                    <-- down-sampled data
└── ...

toloudis · 2026-03-19T16:22:54Z

User story 1 : combining independent segmentation zarrs with raw image zarrs.

We produce multiscale zarrs of our raw microscope images, using filtered downsampling.
We later produce multiscale zarrs of segmentations of the above images, using unfiltered (nearest neighbor) downsampling.

In viewers we want to give users an easy way of combining the two. In particular, our users are interested in seeing the data as if it were actually separate channels of the same volume. This may or may not be a viewer implementation detail, but it could be interesting if the spec supported this, pointing to two separate zarrs and treating them as consecutive channels. For our viewer, this only works if the spatial dimensions are the same, and can be transformed to the same origin (always trivially true for the data I describe).

User story 2: dataset releases

Is it practical to have one single very large collection? as in 1000s of zarrs or more? We would likely produce collections of matched raw+segmentation zarrs as described in my user story 2.

normanrz added 4 commits September 26, 2025 16:42

init rfc-8

b717731

iteration

049164b

node_type -> type

ca3f918

lorenzo

591b1f5

normanrz mentioned this pull request Sep 29, 2025

Extension points #344

Open

normanrz added 2 commits September 29, 2025 14:23

codespell

662d510

add wk examples

d394651

perlman mentioned this pull request Sep 30, 2025

RFC 8: Collections ome-zarr-models/ome-zarr-models-py#299

Open

d-v-b reviewed Oct 2, 2025

View reviewed changes

rfc/8/index.md Outdated Show resolved Hide resolved

d-v-b reviewed Oct 2, 2025

View reviewed changes

rfc/8/index.md Outdated Show resolved Hide resolved

pr feedback

d142f0b

will-moore mentioned this pull request Oct 14, 2025

OME-Zarr Collections support AllenInstitute/biofile-finder#599

Open

toloudis mentioned this pull request Oct 14, 2025

List index out of range accessing a remote Zarr bioio-devs/bioio-ome-zarr#106

Closed

normanrz and others added 3 commits October 28, 2025 11:40

Update index.md

61e4d0d

propose interface between rfc5 and rfc8

7c7a1ec

add johannes soltwedel as author

4db47dd

m-albert mentioned this pull request Oct 29, 2025

Multiscale ome-zarr v3 writer with sharding support mesoSPIM/mesoSPIM-control#89

Closed

jo-mueller and others added 4 commits October 29, 2025 13:03

move up transforms user story

b9b59a4

Merge pull request #1 from jo-mueller/rfc-8

8dd11d6

propose interface between rfc5 and rfc8

examples

a802f84

Merge branch 'rfc-8' of github.com:normanrz/ngff into rfc-8

63ffb61

will-moore reviewed Oct 30, 2025

View reviewed changes

will-moore mentioned this pull request Oct 30, 2025

Collections ome/ome-ngff-validator#62

Draft

4 tasks

jo-mueller mentioned this pull request Oct 30, 2025

Rfc5 review response #350

Merged

lorenzocerrone reviewed Nov 6, 2025

View reviewed changes

normanrz added 2 commits November 12, 2025 16:15

add listing

5704c99

merge

468751d

jo-mueller added the rfc Status: request for comments label Dec 5, 2025

yfukai mentioned this pull request Jan 16, 2026

Mosaic dimension in specification #416

Open

joshmoore added the rfc-8 label Jan 16, 2026

normanrz added 5 commits January 16, 2026 18:16

references, paths, labels, coordinates

0be2a3c

remove examples

cddcba2

consistency

fa5e319

consistency

91f8770

text organization

f623ee2

cleanup

3f653f9

LucaMarconato mentioned this pull request Mar 2, 2026

ENHANCEMENT: to_legacy_anndata support with region > 1 scverse/spatialdata-io#318

Open

clbarnes mentioned this pull request Mar 5, 2026

Multi-document validation imaging-formats/yaozarrs#41

Open

lubianat mentioned this pull request Mar 10, 2026

NGFF PR Review (2026-03-13) German-BioImaging/incubator#55

Closed

jluethi mentioned this pull request Mar 16, 2026

Allow nesting HCS and Scenes? #476

Open


		### Metadata

		This RFC defines two main objects for OME-Zarr: `Collection`, `CollectionNode`.

Conversation

normanrz commented Sep 29, 2025

Uh oh!

github-actions bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Automated Review URLs

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

d-v-b commented Oct 2, 2025

Uh oh!

dstansby commented Oct 15, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

will-moore commented Oct 31, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

imagesc-bot commented Jan 21, 2026

Uh oh!

imagesc-bot commented Feb 18, 2026

Uh oh!

tepals commented Mar 13, 2026

User Story: Large-Scale Multi-Beam vEM Tiling

Uh oh!

toloudis commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

github-actions bot commented Sep 29, 2025 •

edited

Loading

d-v-b Oct 2, 2025 •

edited

Loading

d-v-b Oct 2, 2025 •

edited

Loading