Skip to content

spec: add UUID to multiscales metadata#99

Draft
jo-mueller wants to merge 3 commits intoome:mainfrom
jo-mueller:add-uuids
Draft

spec: add UUID to multiscales metadata#99
jo-mueller wants to merge 3 commits intoome:mainfrom
jo-mueller:add-uuids

Conversation

@jo-mueller
Copy link
Contributor

Fixes ome/ngff#463

Supersedes ome/ngff#115

@jo-mueller jo-mueller added the enhancement New feature or request label Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Automated Review URLs

Co-Authored-By: Josh Moore <josh@openmicroscopy.org>
@clbarnes
Copy link
Contributor

clbarnes commented Mar 3, 2026

Could be more clear about how necessary the prefixes are - whether a bare hex-encoded UUID is acceptable, or whether uuid: or urn:uuid: is required.

@imagesc-bot
Copy link

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/ngff-weekly-dev-update-thread/110810/72

@d-v-b
Copy link
Contributor

d-v-b commented Mar 6, 2026

can you explain what this is for? wouldn't you want a UUID for each array?

@d-v-b
Copy link
Contributor

d-v-b commented Mar 6, 2026

I think you need to explain the semantics for identity here, and potentially describe a content-aware procedure for creating unique identifiers. If I rechunk arrays, does the identifier change? And I assume you would want an independently-created multiscales object with the exact same metadata + same arrays to have the same unique identifier?

@jo-mueller
Copy link
Contributor Author

@d-v-b thanks for the thoughts. I think I can answer some of this, but not all. Identity, as I see it, refers explicitly to the multiscales object as a package of array data + metadata. Rechunking, for instance, will change the semantics of how the described data can be accessed, but it doesn't change the actual data. That being said, in practice the uuid would probably be generated on write, so an operation like loading -> rechunking -> storing would lead to a different uuid in the metadata.

I guess the deeper question here is whether one would want to store different array layouts of the same data under the same uuid which ties into the "same array, different metadata" discussion? In that context, these different metadata objects would probably end up with different uuids, which I think would be ok?

That would mean:

If I rechunk arrays, does the identifier change?

It doesn't need to because rechunking is more about the modalities of data access, not description. But if you read -> rechunk -> write, it probably would change, and that would be ok.

object with the exact same metadata + same arrays to have the same unique identifier?

Kind of the same answer: I think same metadata + same arrays (modulo chunk layout) should be allowed to have the same uuid, but they don't need to.

@d-v-b
Copy link
Contributor

d-v-b commented Mar 6, 2026

I think the spec needs to be really clear on what data transformations require changing the UUID, and whether it's an error if two different multiscales use the same UUID. I think this requires defining what "identity" means for multiscales objects.

@mkitti
Copy link
Member

mkitti commented Mar 6, 2026

If the identifier is content aware, does that mean I have to recalculate the identifier when the content changes? This essential would be come a checksum, and quite an onerous one at that if the checksum must depend on the content as a whole.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add UUID field to multiscales - moved from PR

5 participants